Skip to content

Instantly share code, notes, and snippets.

@statico
Last active February 13, 2026 03:35
Show Gist options
  • Select an option

  • Save statico/19db37b219db26ec919e402dbe101156 to your computer and use it in GitHub Desktop.

Select an option

Save statico/19db37b219db26ec919e402dbe101156 to your computer and use it in GitHub Desktop.
Token Efficiency: JSON Alternatives for LLMs

Here's the current state of the art:

Token Efficiency: JSON Alternatives for LLMs

The Rankings (tokens for same data)

Format Token Reduction vs JSON LLM Accuracy
TOON ~40% fewer 73.9% (best for tabular)
Markdown 34-38% fewer Good (cost-optimized)
YAML ~25-30% fewer Best accuracy (62.1% on GPT-5 Nano)
JSON baseline 69.7%
XML 80% more Worst

YAML — the easy win

Most people are just converting JSON to YAML. It's a lossless swap that drops ~25-30% of tokens by eliminating braces, brackets, and most quotes. LLMs actually understand YAML better than JSON on most models (except Llama).

TOON — purpose-built for LLMs

TOON (Token-Oriented Object Notation) launched Nov 2025. It's specifically designed for feeding structured data to LLMs. Example:

# JSON: 106 tokens
{"hikes":[{"id":1,"name":"Blue Lake Trail","distance":7.5},{"id":2,"name":"Ridge Overlook","distance":9.2}]}

# TOON: ~40% fewer tokens
hikes[2]{id,name,distance}:
  1,Blue Lake Trail,7.5
  2,Ridge Overlook,9.2

It declares the schema once then streams rows — like CSV meets YAML. Works great for uniform arrays (game data, API responses, tables). Loses its advantage for deeply nested/heterogeneous structures.

Markdown tables — surprisingly good

For tabular data specifically, plain markdown tables are the most token-efficient and models understand them well since training data is full of markdown.

Practical advice

  • Tabular/uniform data → TOON or markdown tables
  • Nested structured data → YAML
  • Need structured output back from LLM → JSON (models are better at generating valid JSON due to constrained decoding support)
  • Avoid XML — worst of both worlds (most tokens, lowest accuracy)

For SpaceMolt's game state messages going to LLM players, YAML or TOON could meaningfully cut token costs. Though since the game protocol needs to be parseable by clients too, JSON for the wire format + converting to YAML/TOON in the client prompt might be the practical move.

Sources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment