Intent to product. Describe what you want, Otto builds it, certifies it works, and fixes what doesn't.
One autonomous agent session: plan → build → test → certify → fix → re-certify → ship.
otto build "bookmark manager with tags and search"
Otto launches a single coding agent that:
- Plans — reads the intent, designs architecture
- Builds — implements the product, dispatches subagents for parallel work on independent features
- Tests — writes comprehensive tests, runs them, fixes failures
- Certifies — dispatches a certifier agent (builder-blind) that tests the product as a real user
- Fixes — if certification fails, reads the findings, fixes the code, re-certifies
- Ships — commits when certification passes
The certifier is a separate agent that doesn't know how the product was built. It reads the project fresh, installs deps, starts the app, and runs real user stories (curl for APIs, CLI commands for tools, import for libraries, agent-browser for web UIs). It reports what's wrong — the coding agent figures out the fix.
Agentic mode — one agent builds, certifies, fixes
All journeys passed (round 1)
✓ Registration, login, and first note creation work with JWT token flow
✓ Create/Read/Update/Delete notes with tags return correct status codes
✓ Cross-user read/update/delete blocked; list only shows own notes
✓ Unauthenticated and invalid-token requests rejected
✓ Tag and keyword search return correct results
✓ Data survives server restart
✓ Input validation, special characters, boundary conditions handled
Build Summary (build-1775522841-19595)
Stories: 7 passed, 0 failed
Total cost: $1.04
Duration: 4.8 min
# Install
uv pip install -e .
# In any git repo
cd your-project
otto build "REST API for a todo app with user auth"Otto handles any product type:
- CLI tools — argparse, Click, Cobra (curl + CLI command testing)
- REST APIs — Express, Flask, FastAPI (curl + HTTP testing)
- Libraries — Python, Node.js (import + unit testing)
- Web apps — server-rendered HTML (curl + agent-browser visual testing)
- Hybrid — API + CLI + UI tested across all surfaces
otto build "intent"
│
└─ Coding Agent (one session, drives everything)
│
├─ Plan architecture
├─ Build (subagents for parallel features)
├─ Write tests, run, fix
├─ Commit
│
├─ Dispatch Certifier Agent (builder-blind)
│ ├─ Read project fresh
│ ├─ Install deps, start app
│ ├─ Discover auth (once, share with subagents)
│ ├─ Dispatch test subagents (parallel stories)
│ │ ├─ First Experience
│ │ ├─ CRUD Lifecycle
│ │ ├─ Data Isolation
│ │ ├─ Persistence / Access Control
│ │ └─ Edge Cases
│ └─ Report: PASS/FAIL per story + evidence
│
├─ If FAIL: read findings, fix code, commit
├─ Re-dispatch certifier
└─ Repeat until PASS
The certifier reports symptoms only — no fix suggestions. The coding agent diagnoses and fixes.
otto build "intent" # default: agent-driven (recommended)
otto build "intent" --split # separate build + certify sessions
otto build "intent" --orchestrated # legacy PER pipeline with task queue
otto build "intent" Build a product from natural language intent
otto add "prompt" Add a task to the queue (for --orchestrated mode)
otto run Run pending tasks (--orchestrated mode)
otto status Show task states
otto logs <id> Show agent logs
otto.yaml (auto-created on first run):
# Model (optional — defaults to provider's best)
# model: claude-sonnet-4-5-20250514
# Product certification
# certifier_timeout: 900 # max seconds for build+certify
# certifier_browser: null # null = auto-detect; true/false to force
# certifier_interaction: null # override product type (http/cli/import)otto_logs/
builds/<build-id>/
agent.log Key events: commits, certifier rounds, verdict
agent-raw.log Full agent output (for deep debugging)
checkpoint.json Build metadata: cost, duration, stories
certifier/
proof-of-work.json Machine-readable: stories, evidence, rounds
proof-of-work.html Human-readable: styled report with evidence
proof-of-work.md Markdown summary
otto/
pipeline.py Build pipelines (v3 default + v2 split + orchestrated)
certifier/
__init__.py Agentic certifier (single agent + subagents)
report.py CertificationReport, Finding, Outcome dataclasses
agent.py Agent SDK wrapper (query, run_agent_query)
session.py AgentSession for split mode (resume across rounds)
cli.py CLI commands (build, add, run, status)
config.py Config loading, otto.yaml creation
orchestrator.py PER pipeline (--orchestrated mode)
runner.py Task runner for orchestrated mode
qa.py QA agent for orchestrated mode
display.py Live terminal display
observability.py Log writing utilities
- Python 3.11+
- Claude Code CLI installed and authenticated
- Git repository
- Optional:
agent-browserCLI for visual web app testing
MIT