Token-Oriented Object Notation (TOON) is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input, not output.
TOON's sweet spot is uniform arrays of objects – multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts. For deeply nested or non-uniform data, JSON may be more efficient.
💡 Tip: Think of TOON as a translation layer: use JSON programmatically, convert to TOON for LLM input.
- Why TOON?
- Key Features
- Benchmarks
- Installation & Quick Start
- Format Overview
- API
- Using TOON in LLM Prompts
- Notes and Limitations
- Full Specification
- Other Implementations
- License
AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money – and standard JSON is verbose and token-expensive:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}TOON conveys the same information with fewer tokens:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Why create a new format? Because existing alternatives don't fit:
- JSON: Too verbose for tabular data
- CSV: No nested structures
- YAML: Better than JSON, but still repeats keys
- Protocol Buffers/MessagePack: Binary formats requiring schema definitions
TOON bridges these gaps with a text format optimized for token efficiency and LLM-friendly guardrails.
- 💸 Token-efficient: typically 30–60% fewer tokens than JSON
- 🤿 LLM-friendly guardrails: explicit lengths and fields enable validation
- 🍱 Minimal syntax: removes redundant punctuation (braces, brackets, most quotes)
- 📐 Indentation-based structure: like YAML, uses whitespace instead of braces
- 🧺 Tabular arrays: declare keys once, stream data as rows
⭐ GitHub Repositories ██████████████░░░░░░░░░░░ 8,745 tokens
vs JSON (-42.3%) 15,145
vs JSON compact (-23.7%) 11,455
vs YAML (-33.4%) 13,129
vs XML (-48.8%) 17,095
📈 Daily Analytics ██████████░░░░░░░░░░░░░░░ 4,507 tokens
vs JSON (-58.9%) 10,977
vs JSON compact (-35.7%) 7,013
vs YAML (-48.8%) 8,810
vs XML (-65.7%) 13,128
🛒 E-Commerce Order ████████████████░░░░░░░░░ 166 tokens
vs JSON (-35.4%) 257
vs JSON compact (-2.9%) 171
vs YAML (-15.7%) 197
vs XML (-38.7%) 271
─────────────────────────────────────────────────────────────────────
Total ██████████████░░░░░░░░░░░ 13,418 tokens
vs JSON (-49.1%) 26,379
vs JSON compact (-28.0%) 18,639
vs YAML (-39.4%) 22,136
vs XML (-56.0%) 30,494
Note: Token savings are measured against formatted JSON (2-space indentation). Measured with gpt-tokenizer using
o200k_baseencoding (GPT-5 tokenizer). Actual savings vary by model and tokenizer.
# Clone the repository
git clone https://github.com/ronak/py-toon.git
cd py-toon
# Install locally
pip install -e .
# Or install directly from GitHub
pip install git+https://github.com/ronak/py-toon.gitfrom toon_format import encode, decode
# Encode Python data to TOON
data = {
"users": [
{"id": 1, "name": "Alice", "role": "admin"},
{"id": 2, "name": "Bob", "role": "user"}
]
}
toon_string = encode(data)
print(toon_string)
# users[2]{id,name,role}:
# 1,Alice,admin
# 2,Bob,user
# Decode TOON back to Python
original_data = decode(toon_string)
print(original_data)
# {'users': [{'id': 1, 'name': 'Alice', 'role': 'admin'}, ...]}Simple objects with primitive values:
encode({
"id": 123,
"name": "Ada",
"active": True
})id: 123
name: Ada
active: true
Nested objects:
encode({
"user": {
"id": 123,
"name": "Ada"
}
})user:
id: 123
name: Ada
encode({"tags": ["admin", "ops", "dev"]})tags[3]: admin,ops,dev
When all objects share the same primitive fields, TOON uses an efficient tabular format:
encode({
"items": [
{"sku": "A1", "qty": 2, "price": 9.99},
{"sku": "B2", "qty": 1, "price": 14.5}
]
})items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
Arrays that don't meet the tabular requirements use list format:
items[3]:
- 1
- a: 1
- text
When objects appear in list format, the first field is placed on the hyphen line:
items[2]:
- id: 1
name: First
- id: 2
name: Second
extra: true
encode({"pairs": [[1, 2], [3, 4]]})pairs[2]:
- [2]: 1,2
- [2]: 3,4
encode({"items": []}) # items[0]:
encode([]) # [0]:
encode({}) # (empty output)
encode({"config": {}}) # config:TOON quotes strings only when necessary to maximize token efficiency:
- Inner spaces are allowed; leading or trailing spaces force quotes
- Unicode and emoji are safe unquoted
- Quotes and control characters are escaped with backslash
String values are quoted when:
- Empty string:
"" - Leading or trailing spaces:
" padded "," " - Contains active delimiter, colon, quote, backslash, or control chars:
"a,b","a:b","say \"hi\"" - Looks like boolean/number/null:
"true","42","null" - Starts with
"- "(list-like):"- item" - Looks like structural token:
"[5]","{key}"
Object keys are unquoted if they match the identifier pattern (start with letter or underscore, followed by letters, digits, underscores, or dots). All other keys must be quoted.
Converts any JSON-serializable value to TOON format.
Parameters:
value– Any JSON-serializable value (object, array, primitive, or nested structure)indent– Number of spaces per indentation level (default:2)delimiter– Delimiter for array values:',','\t', or'|'(default:',')length_marker– Add#prefix to array lengths, e.g.,items[#3](default:False)
Returns: A TOON-formatted string
Example:
from toon_format import encode
items = [
{"sku": "A1", "qty": 2, "price": 9.99},
{"sku": "B2", "qty": 1, "price": 14.5}
]
# Default (comma delimiter)
print(encode({"items": items}))
# Tab delimiter (often more token-efficient)
print(encode({"items": items}, delimiter='\t'))
# With length marker
print(encode({"items": items}, length_marker=True))Converts a TOON-formatted string back to Python values.
Parameters:
input_str– A TOON-formatted string to parseindent– Expected number of spaces per indentation level (default:2)strict– Enable strict validation (default:True)
Returns: A Python value (dict, list, or primitive)
Example:
from toon_format import decode
toon = """
items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
"""
data = decode(toon)
# {'items': [{'sku': 'A1', 'qty': 2, 'price': 9.99}, ...]}Strict Mode:
By default, the decoder validates input strictly:
- Invalid escape sequences throw errors
- Syntax errors throw on missing colons, malformed headers
- Array length mismatches throw when declared length doesn't match actual count
- Delimiter mismatches throw when row delimiters don't match header
Use strict=False for lenient parsing.
TOON works best when you show the format instead of describing it. The structure is self-documenting – models parse it naturally once they see the pattern.
Wrap your encoded data in a fenced code block (label it ```toon for clarity). The indentation and headers are usually enough – models treat it like familiar YAML or CSV. The explicit length markers ([N]) and field headers ({field1,field2}) help the model track structure.
For output, be more explicit. When you want the model to generate TOON:
- Show the expected header (
users[N]{id,name,role}:). The model fills rows instead of repeating keys, reducing generation errors. - State the rules: 2-space indent, no trailing spaces,
[N]matches row count.
Example prompt:
Data is in TOON format (2-space indent, arrays show length and fields).
```toon
users[3]{id,name,role,lastLogin}:
1,Alice,admin,2025-01-15T10:30:00Z
2,Bob,user,2025-01-14T15:22:00Z
3,Charlie,user,2025-01-13T09:45:00Z
```
Task: Return only users with role "user" as TOON. Use the same header. Set [N] to match the row count. Output only the code block.
💡 Tip: For large uniform tables, use
encode(data, delimiter='\t')and tell the model "fields are tab-separated." Tabs often tokenize better than commas.
-
Format familiarity and structure matter as much as token count. TOON's tabular format requires arrays of objects with identical keys and primitive values only. When this doesn't hold (mixed types, non-uniform objects, nested structures), TOON switches to list format where JSON can be more efficient at scale.
- TOON excels at: Uniform arrays of objects (same fields, primitive values), especially large datasets with consistent structure
- JSON is better for: Non-uniform data, deeply nested structures, and objects with varying field sets
-
Token counts vary by tokenizer and model. Benchmarks use a GPT-style tokenizer; actual savings will differ with other models.
-
TOON is designed for LLM input where human readability and token efficiency matter. It's not a drop-in replacement for JSON in APIs or storage.
For precise formatting rules and implementation details, see the full specification (currently v1.3).
The conformance tests provide language-agnostic test fixtures that validate implementations across any language.
- JavaScript/TypeScript: @toon-format/toon (reference implementation)
- .NET: ToonSharp
- Crystal: toon-crystal
- Dart: toon
- Elixir: toon_ex
- Gleam: toon_codec
- Go: gotoon
- Java: JToon
- OCaml: ocaml-toon
- PHP: toon-php
- Python: python-toon, pytoon, py-toon (this package)
- Ruby: toon-ruby
- Rust: rtoon
- Swift: TOONEncoder
MIT License © 2025 Ronak
Contributions are welcome! Please feel free to submit a Pull Request.
When implementing features, please follow the TOON specification to ensure compatibility across implementations.
TOON format specification and reference implementation by Johann Schopplich and contributors.
This Python implementation follows the official TOON specification v1.3.