Here's the current state of the art:
| Format | Token Reduction vs JSON | LLM Accuracy |
|---|---|---|
| TOON | ~40% fewer | 73.9% (best for tabular) |
| Markdown | 34-38% fewer | Good (cost-optimized) |
| YAML | ~25-30% fewer | Best accuracy (62.1% on GPT-5 Nano) |
| JSON | baseline | 69.7% |
| XML | 80% more | Worst |
Most people are just converting JSON to YAML. It's a lossless swap that drops ~25-30% of tokens by eliminating braces, brackets, and most quotes. LLMs actually understand YAML better than JSON on most models (except Llama).
TOON (Token-Oriented Object Notation) launched Nov 2025. It's specifically designed for feeding structured data to LLMs. Example:
# JSON: 106 tokens
{"hikes":[{"id":1,"name":"Blue Lake Trail","distance":7.5},{"id":2,"name":"Ridge Overlook","distance":9.2}]}
# TOON: ~40% fewer tokens
hikes[2]{id,name,distance}:
1,Blue Lake Trail,7.5
2,Ridge Overlook,9.2
It declares the schema once then streams rows — like CSV meets YAML. Works great for uniform arrays (game data, API responses, tables). Loses its advantage for deeply nested/heterogeneous structures.
For tabular data specifically, plain markdown tables are the most token-efficient and models understand them well since training data is full of markdown.
- Tabular/uniform data → TOON or markdown tables
- Nested structured data → YAML
- Need structured output back from LLM → JSON (models are better at generating valid JSON due to constrained decoding support)
- Avoid XML — worst of both worlds (most tokens, lowest accuracy)
For SpaceMolt's game state messages going to LLM players, YAML or TOON could meaningfully cut token costs. Though since the game protocol needs to be parseable by clients too, JSON for the wire format + converting to YAML/TOON in the client prompt might be the practical move.
Sources: