A Model Context Protocol (MCP) server for aggressive context compaction in Claude Code. Save 90%+ tokens by compacting web research, task outputs, and conversations into beautiful, structured snapshots.
Inspired by this talk by Dexter Horthy from HumanLayer, and his team's work on ACE: Advanced Context Engineering for Coding Agents, 12-Factor Agents & the recent release of the TOON (Token-Oriented Object Notation) Format.
Requirements:
npm install -g claude-praetorian-mcpFrom shell:
claude mcp add claude-praetorian-mcp -- bunx claude-praetorian-mcpFrom inside Claude (restart required):
Add this to our global mcp config: bunx claude-praetorian-mcp
Install this mcp: https://github.com/Vvkmnn/claude-praetorian-mcp
From any manually configurable mcp.json: (Cursor, Windsurf, etc.)
{
"mcpServers": {
"praetorian": {
"command": "bunx",
"args": ["claude-praetorian-mcp"],
"env": {}
}
}
}MCP server for aggressive context compaction. Generates structured incremental snapshots to yield 90%+ token savings and easily refresh context with "frequent intentional compaction".
Currently runs project by project and saves artifacts to {$project}/.claude/praetorian via the following tools (and a royal guard ⚜️):
(Incrementally) compact context using the TOON format to get the most valuable tokens from an activity.
⚜️ praetorian_compact type=<type> title=<title>
> "ACE Framework research - save 1,450 tokens"
> "Icon rendering bug investigation - compact the findings"
> "Database architecture decisions - preserve the rationale"
> "WebFetch results from authentication docs"
> "Task output from explore subagent - code structure analysis"
⚜️ compact | Created
┌─ ⚜️ ────────────────────────────────────────────────── Created ─┐
│ Compacted: "ACE Framework Research" • 1,450 tokens saved
│ Type: web_research • ID: cpt_1765245902396_nxetoc
└───────────────────────────────────────────────────────────────────┘
⚜️ compact | Merged
┌─ ⚜️ ───────────────────────────────────────────────────── Merged ─┐
│ Compacted: "Authentication Patterns" • 890 tokens saved
│ Type: decisions • ID: cpt_1765245903512_xk9mp1
│ Merged with: cpt_1765245903512_xk9mp1
└────────────────────────────────────────────────────────────────────┘
Search and restore context by injecting TOON tokens back into current context as needed.
⚜️ praetorian_restore query=<query>
> "What did we learn about authentication?"
> "Find the Docker container debugging session"
> "Show recent architecture decisions"
> "Search for MCP server implementation patterns"
> "" (empty = recent compactions)
⚜️ restore | Search
┌─ ⚜️ ───────────────────────────────────────────────────── Search ─┐
│ Found 2 compactions
│ Query: "authentication"
└────────────────────────────────────────────────────────────────────┘
⚜️ restore | Recent
┌─ ⚜️ ───────────────────────────────────────────────────── Recent ─┐
│ Found 3 compactions
└────────────────────────────────────────────────────────────────────┘
Status indicators:
- Created - New compaction saved
- Merged - Updated existing compaction (>70% title similarity)
- Search - Search results returned (keyword matching)
- Recent - Recent compactions listed (by updated time)
Praetorian is designed for heavy, frequent use. The more you compact, the more you save.
When to compact:
- ✅ After every WebFetch
- ✅ After every Task/subagent completes
- ✅ After reading multiple files
- ✅ After making decisions
- ✅ During long conversations (proactive compaction)
- ✅ Before context gets >60% full
Real-world example session:
| Compaction | Before | After | Saved |
|---|---|---|---|
| Web research (3 URLs) | 4,500 | 300 | 4,200 |
| Subagent outputs (2) | 3,500 | 300 | 3,200 |
| Architecture debates | 5,000 | 300 | 4,700 |
| Hook research | 1,500 | 150 | 1,350 |
| Total | 14,500 | 1,050 | 13,450 (93%) |
Next session: restore() loads ~1,000 tokens. Instant resume, no re-research.
How claude-praetorian-mcp works:
claude-praetorian-mcp
======================================
praetorian_compact (write)
--------------------------
INPUT ──> VALIDATE ──> DETECT ──> MERGE ──> ENCODE ──> INDEX ──> OUTPUT
│ │ │ │ │
│ │ │ │ └─ words -> IDs
│ │ │ └─ .toon (30-60% smaller)
│ │ └─ dedupe arrays, combine objects
│ └─ Jaccard similarity > 70% = auto-merge
└─ Zod schemas (CompactInput, Compaction)
praetorian_restore (read)
-------------------------
┌─────────┐
QUERY ──> SEARCH ───┤ ├──> DECODE ──> OUTPUT
│ INDEX │
(none) ──> RECENT ──┤ │
└─────────┘
Storage: .claude/praetorian/
----------------------------
index.json Inverted word index + compaction metadata
compactions/*.toon TOON-encoded compaction files
Core optimizations:
- TOON format: 30-60% fewer tokens than YAML/JSON
- Zod validation: Production-grade runtime type safety
- Jaccard similarity: Smart deduplication via title matching (>70% threshold)
- Inverted index: Fast keyword search without vector embeddings
- Smart merging: Combine similar compactions without duplication
File access:
- Stores in:
<project>/.claude/praetorian/ - TOON format:
.toonfiles (40% fewer bytes than YAML/XML) - Zero database dependencies (no db calls or filesystem)
- Never leaves your machine
git clone https://github.com/Vvkmnn/claude-praetorian-mcp && cd claude-praetorian-mcp
npm install && npm run buildPackage requirements:
- Node.js: >=20.0.0 (ES modules)
- npm: >=10.0.0 (package-lock v3)
- Runtime:
@modelcontextprotocol/sdk,@toon-format/toon,zod - Zero external databases - works with
bunx
Development workflow:
npm run build # TypeScript compilation
npm run watch # Watch mode with tsc --watch
node dist/index.js # Run MCP server directly (stdio)Contributing:
- Fork the repository and create feature branches
- Test with multiple compaction types before submitting PRs
- Follow TypeScript strict mode and MCP protocol
Tiberius Claudius Caesar Augustus Germanicus - Declared emperor by his Praetorian Guard
