py-toon

Token-Oriented Object Notation (TOON) is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input, not output.

TOON's sweet spot is uniform arrays of objects – multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts. For deeply nested or non-uniform data, JSON may be more efficient.

💡 Tip: Think of TOON as a translation layer: use JSON programmatically, convert to TOON for LLM input.

Why TOON?

AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money – and standard JSON is verbose and token-expensive:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

TOON conveys the same information with fewer tokens:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Why create a new format? Because existing alternatives don't fit:

JSON: Too verbose for tabular data
CSV: No nested structures
YAML: Better than JSON, but still repeats keys
Protocol Buffers/MessagePack: Binary formats requiring schema definitions

TOON bridges these gaps with a text format optimized for token efficiency and LLM-friendly guardrails.

Key Features

💸 Token-efficient: typically 30–60% fewer tokens than JSON
🤿 LLM-friendly guardrails: explicit lengths and fields enable validation
🍱 Minimal syntax: removes redundant punctuation (braces, brackets, most quotes)
📐 Indentation-based structure: like YAML, uses whitespace instead of braces
🧺 Tabular arrays: declare keys once, stream data as rows

Benchmarks

Token Efficiency

⭐ GitHub Repositories       ██████████████░░░░░░░░░░░    8,745 tokens
                             vs JSON (-42.3%)           15,145
                             vs JSON compact (-23.7%)   11,455
                             vs YAML (-33.4%)           13,129
                             vs XML (-48.8%)            17,095

📈 Daily Analytics           ██████████░░░░░░░░░░░░░░░    4,507 tokens
                             vs JSON (-58.9%)           10,977
                             vs JSON compact (-35.7%)    7,013
                             vs YAML (-48.8%)            8,810
                             vs XML (-65.7%)            13,128

🛒 E-Commerce Order          ████████████████░░░░░░░░░      166 tokens
                             vs JSON (-35.4%)              257
                             vs JSON compact (-2.9%)       171
                             vs YAML (-15.7%)              197
                             vs XML (-38.7%)               271

─────────────────────────────────────────────────────────────────────
Total                        ██████████████░░░░░░░░░░░   13,418 tokens
                             vs JSON (-49.1%)           26,379
                             vs JSON compact (-28.0%)   18,639
                             vs YAML (-39.4%)           22,136
                             vs XML (-56.0%)            30,494

Note: Token savings are measured against formatted JSON (2-space indentation). Measured with gpt-tokenizer using o200k_base encoding (GPT-5 tokenizer). Actual savings vary by model and tokenizer.

Installation & Quick Start

# Clone the repository
git clone https://github.com/ronak/py-toon.git
cd py-toon

# Install locally
pip install -e .

# Or install directly from GitHub
pip install git+https://github.com/ronak/py-toon.git

Example Usage

from toon_format import encode, decode

# Encode Python data to TOON
data = {
    "users": [
        {"id": 1, "name": "Alice", "role": "admin"},
        {"id": 2, "name": "Bob", "role": "user"}
    ]
}

toon_string = encode(data)
print(toon_string)
# users[2]{id,name,role}:
#   1,Alice,admin
#   2,Bob,user

# Decode TOON back to Python
original_data = decode(toon_string)
print(original_data)
# {'users': [{'id': 1, 'name': 'Alice', 'role': 'admin'}, ...]}

Format Overview

Objects

Simple objects with primitive values:

encode({
    "id": 123,
    "name": "Ada",
    "active": True
})

id: 123
name: Ada
active: true

Nested objects:

encode({
    "user": {
        "id": 123,
        "name": "Ada"
    }
})

user:
  id: 123
  name: Ada

Arrays

Primitive Arrays (Inline)

encode({"tags": ["admin", "ops", "dev"]})

tags[3]: admin,ops,dev

Arrays of Objects (Tabular)

When all objects share the same primitive fields, TOON uses an efficient tabular format:

encode({
    "items": [
        {"sku": "A1", "qty": 2, "price": 9.99},
        {"sku": "B2", "qty": 1, "price": 14.5}
    ]
})

items[2]{sku,qty,price}:
  A1,2,9.99
  B2,1,14.5

Mixed and Non-Uniform Arrays

Arrays that don't meet the tabular requirements use list format:

items[3]:
  - 1
  - a: 1
  - text

When objects appear in list format, the first field is placed on the hyphen line:

items[2]:
  - id: 1
    name: First
  - id: 2
    name: Second
    extra: true

Arrays of Arrays

encode({"pairs": [[1, 2], [3, 4]]})

pairs[2]:
  - [2]: 1,2
  - [2]: 3,4

Empty Arrays and Objects

encode({"items": []})  # items[0]:
encode([])              # [0]:
encode({})              # (empty output)
encode({"config": {}})  # config:

Quoting Rules

TOON quotes strings only when necessary to maximize token efficiency:

Inner spaces are allowed; leading or trailing spaces force quotes
Unicode and emoji are safe unquoted
Quotes and control characters are escaped with backslash

String values are quoted when:

Empty string: ""
Leading or trailing spaces: " padded ", " "
Contains active delimiter, colon, quote, backslash, or control chars: "a,b", "a:b", "say \"hi\""
Looks like boolean/number/null: "true", "42", "null"
Starts with "- " (list-like): "- item"
Looks like structural token: "[5]", "{key}"

Object keys are unquoted if they match the identifier pattern (start with letter or underscore, followed by letters, digits, underscores, or dots). All other keys must be quoted.

API

`encode(value, indent=2, delimiter=',', length_marker=False)`

Converts any JSON-serializable value to TOON format.

Parameters:

value – Any JSON-serializable value (object, array, primitive, or nested structure)
indent – Number of spaces per indentation level (default: 2)
delimiter – Delimiter for array values: ',', '\t', or '|' (default: ',')
length_marker – Add # prefix to array lengths, e.g., items[#3] (default: False)

Returns: A TOON-formatted string

Example:

from toon_format import encode

items = [
    {"sku": "A1", "qty": 2, "price": 9.99},
    {"sku": "B2", "qty": 1, "price": 14.5}
]

# Default (comma delimiter)
print(encode({"items": items}))

# Tab delimiter (often more token-efficient)
print(encode({"items": items}, delimiter='\t'))

# With length marker
print(encode({"items": items}, length_marker=True))

`decode(input_str, indent=2, strict=True)`

Converts a TOON-formatted string back to Python values.

Parameters:

input_str – A TOON-formatted string to parse
indent – Expected number of spaces per indentation level (default: 2)
strict – Enable strict validation (default: True)

Returns: A Python value (dict, list, or primitive)

Example:

from toon_format import decode

toon = """
items[2]{sku,qty,price}:
  A1,2,9.99
  B2,1,14.5
"""

data = decode(toon)
# {'items': [{'sku': 'A1', 'qty': 2, 'price': 9.99}, ...]}

Strict Mode:

By default, the decoder validates input strictly:

Invalid escape sequences throw errors
Syntax errors throw on missing colons, malformed headers
Array length mismatches throw when declared length doesn't match actual count
Delimiter mismatches throw when row delimiters don't match header

Use strict=False for lenient parsing.

Using TOON in LLM Prompts

TOON works best when you show the format instead of describing it. The structure is self-documenting – models parse it naturally once they see the pattern.

Sending TOON to LLMs (Input)

Wrap your encoded data in a fenced code block (label it ```toon for clarity). The indentation and headers are usually enough – models treat it like familiar YAML or CSV. The explicit length markers ([N]) and field headers ({field1,field2}) help the model track structure.

Generating TOON from LLMs (Output)

For output, be more explicit. When you want the model to generate TOON:

Show the expected header (users[N]{id,name,role}:). The model fills rows instead of repeating keys, reducing generation errors.
State the rules: 2-space indent, no trailing spaces, [N] matches row count.

Example prompt:

Data is in TOON format (2-space indent, arrays show length and fields).

```toon
users[3]{id,name,role,lastLogin}:
  1,Alice,admin,2025-01-15T10:30:00Z
  2,Bob,user,2025-01-14T15:22:00Z
  3,Charlie,user,2025-01-13T09:45:00Z
```

Task: Return only users with role "user" as TOON. Use the same header. Set [N] to match the row count. Output only the code block.

💡 Tip: For large uniform tables, use encode(data, delimiter='\t') and tell the model "fields are tab-separated." Tabs often tokenize better than commas.

Notes and Limitations

Format familiarity and structure matter as much as token count. TOON's tabular format requires arrays of objects with identical keys and primitive values only. When this doesn't hold (mixed types, non-uniform objects, nested structures), TOON switches to list format where JSON can be more efficient at scale.
- TOON excels at: Uniform arrays of objects (same fields, primitive values), especially large datasets with consistent structure
- JSON is better for: Non-uniform data, deeply nested structures, and objects with varying field sets
Token counts vary by tokenizer and model. Benchmarks use a GPT-style tokenizer; actual savings will differ with other models.
TOON is designed for LLM input where human readability and token efficiency matter. It's not a drop-in replacement for JSON in APIs or storage.

Full Specification

For precise formatting rules and implementation details, see the full specification (currently v1.3).

The conformance tests provide language-agnostic test fixtures that validate implementations across any language.

Other Implementations

JavaScript/TypeScript: @toon-format/toon (reference implementation)
.NET: ToonSharp
Crystal: toon-crystal
Dart: toon
Elixir: toon_ex
Gleam: toon_codec
Go: gotoon
Java: JToon
OCaml: ocaml-toon
PHP: toon-php
Python: python-toon, pytoon, py-toon (this package)
Ruby: toon-ruby
Rust: rtoon
Swift: TOONEncoder

License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

When implementing features, please follow the TOON specification to ensure compatibility across implementations.

Acknowledgments

TOON format specification and reference implementation by Johann Schopplich and contributors.

This Python implementation follows the official TOON specification v1.3.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
tests		tests
toon_format		toon_format
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

py-toon

📋 Table of Contents

Why TOON?

Key Features

Benchmarks

Token Efficiency

Installation & Quick Start

Example Usage

Format Overview

Objects

Arrays

Primitive Arrays (Inline)

Arrays of Objects (Tabular)

Mixed and Non-Uniform Arrays

Arrays of Arrays

Empty Arrays and Objects

Quoting Rules

API

`encode(value, indent=2, delimiter=',', length_marker=False)`

`decode(input_str, indent=2, strict=True)`

Using TOON in LLM Prompts

Sending TOON to LLMs (Input)

Generating TOON from LLMs (Output)

Notes and Limitations

Full Specification

Other Implementations

License

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

ron-42/py-toon

Folders and files

Latest commit

History

Repository files navigation

py-toon

📋 Table of Contents

Why TOON?

Key Features

Benchmarks

Token Efficiency

Installation & Quick Start

Example Usage

Format Overview

Objects

Arrays

Primitive Arrays (Inline)

Arrays of Objects (Tabular)

Mixed and Non-Uniform Arrays

Arrays of Arrays

Empty Arrays and Objects

Quoting Rules

API

encode(value, indent=2, delimiter=',', length_marker=False)

decode(input_str, indent=2, strict=True)

Using TOON in LLM Prompts

Sending TOON to LLMs (Input)

Generating TOON from LLMs (Output)

Notes and Limitations

Full Specification

Other Implementations

License

Contributing

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`encode(value, indent=2, delimiter=',', length_marker=False)`

`decode(input_str, indent=2, strict=True)`

Packages