HTML vs Markdown for Agent Memory: A Full-Scale Benchmark on Accuracy, Latency and Cost

A few weeks ago, we ran a comparison test between HTML and Markdown on LoCoMo with 603 questions and HTML won.

To validate the results on real production, we re-ran our agent memory benchmark at full scale: 1,982 questions across 11 conversations. The results: HTML beat Markdown on cost (50% cheaper), accuracy (+0.26 points), and speed (40% faster on query, 12.5% faster on curate).

The Question

Thariq from the Claude Code team had already argued that HTML was an unreasonably effective output format for AI agents because humans actually read it. If HTML helps when the human is the reader, is HTML truly effective when the agent is the reader of its own memory?

The Benchmark

This time we ran LoCoMo at full scale with 1,982 questions across every conversation and category. The setup was the same as the previous run, with two isolated context trees (one Markdown, one HTML) queried by the same agent.

HTML wins almost every category: the overall accuracy is really close, but the sub-category comparison shows the difference.

Multi-hop questions gain +1.06 points and temporal questions gain +1.25 points. Multi-hop questions require the agent to pull facts from multiple notes to find the answer. Temporal questions require the agent to pull facts in the right time order. Both are the hardest kinds of questions, because the agent has to connect several things together.
Markdown only won on adversarial questions (−1.35 points). Adversarial questions are designed to trick the agent with confusing wording that nudges it toward the wrong answer.

The differences in the cost and token columns are obvious:

Query input tokens drop by 68%, query cost drops by 68%, and total cost drops by 50%.
For latency: HTML's slowest queries (p95) are much slower than Markdown's (4,311 ms versus 259 ms).

Core Values of HTML Format

1. Higher accuracy at lower cost:

HTML gives you better answers for half the cost, and we now have the full-scale numbers to prove it.

The earlier 603-question test already showed HTML beating Markdown on cost and accuracy. To validate this result on a full-scale, we re-ran the full Locomo benchmark with 1,982 questions across 11 conversations. The result: HTML scored 90.77% accuracy at a total cost of $4.11, while Markdown scored 90.51% at $8.30.

Anyone with a browser can open HTML without extra tools:

Because an HTML file opens directly in any browser with no extra tool needed. Markdown files, by contrast, need a separate viewer or editor to render properly, and most browsers cannot display them on their own.

Agents get the exact fact you need:

Because HTML files follow a strict structure that every browser and every agent already understands. On top of that, ByteRover adds typed elements like <bv-rule>, <bv-decision>, and <bv-bug>, which let an agent search the memory tree the same way you would search a database.

Works inside new cross-agent protocols:

Cross-agent protocols like A2A (Agent-to-Agent) and MCP (Model Context Protocol) need standardized input and output between agents, and HTML provides exactly that.

These protocols standardize the envelope that agents pass to each other. HTML memory standardizes what goes inside that envelope, so both sides of the handoff are structured instead of free-form text.

Conclusion

To conclude, HTML is not only more readable for humans but also more effective for agent memory (in accuracy, cost, and latency), as the results of this full-scale benchmark show. As a next step, we are shipping HTML as the default format for ByteRover memory, which will be released soon.

HTML vs Markdown for Agent Memory: A Full-Scale Benchmark on Accuracy, Latency and Cost

The Question

The Benchmark

Core Values of HTML Format

1. Higher accuracy at lower cost:

Anyone with a browser can open HTML without extra tools:

Agents get the exact fact you need:

Works inside new cross-agent protocols:

Conclusion