A generic, configurable harness for long-running autonomous coding agents. Built on the Claude Agent SDK, it implements Anthropic's guide for effective agent harnesses, featuring phase-driven execution, configurable MCP tools, and SDK-native sandbox isolation.
Agent Harness provides:
- Phase-based workflows — declarative phase definitions with conditions and run-once semantics
- TOML-based configuration — no code changes needed to customize behavior
- SDK-native security — OS sandbox with network isolation, declarative permission rules (allow/deny), secure defaults
- Progress tracking — JSON checklist, notes file, or none with automatic completion detection
- Error recovery — exponential backoff and circuit breaker to prevent runaway costs
- MCP server support — browser automation, databases, etc.
- Session persistence — auto-continue across sessions with state tracking
- Setup verification — check auth, tools, config before running
git clone <repo-url>
cd claude-agent-harness
uv syncAlternative using pip
python3 -m venv .venv && source .venv/bin/activate
pip install -e .If using pip, replace uv run with python in all commands below.
Export one of these environment variables:
ANTHROPIC_API_KEY— get one from console.anthropic.comCLAUDE_CODE_OAUTH_TOKEN— viaclaude setup-token
See .env.example for all options.
Using 1Password CLI
If you manage secrets with 1Password CLI, create a .env file with an op:// reference:
ANTHROPIC_API_KEY="op://Vault/Item/api_key"
Then wrap any command with op run:
op run --env-file "./.env" -- uv run python -m agent_harness run --project-dir ./my-project# Scaffold a new project configuration
uv run python -m agent_harness init --project-dir ./my-project
# Edit the configuration
# -> ./my-project/.agent-harness/config.toml
# Verify setup
uv run python -m agent_harness verify --project-dir ./my-project
# Run the agent
uv run python -m agent_harness run --project-dir ./my-project# Run the agent
uv run python -m agent_harness run --project-dir <path> [options]
# Verify setup (auth, dependencies, config)
uv run python -m agent_harness verify [--project-dir <path>]
# Scaffold new project configuration
uv run python -m agent_harness init --project-dir <path>
# Global flags (all commands)
--project-dir PATH # Agent's working directory (default: .)
# Run command options
--max-iterations N # Override max iterations (default: from config)
--model MODEL # Override model (default: from config)The harness executes agents in configurable phases with conditions and run-once semantics. Each phase gets a fresh Claude SDK session (no context carryover) with a configured prompt.
Phase execution:
- Phases run sequentially based on conditions (
exists:,not_exists:path checks) run_once: truephases skip after first successful completion- State persists in
.agent-harness/session.json
Session management:
- Fresh context per session prevents context pollution
- Progress preserved via tracking file (e.g.,
feature_list.json), session state, and git commits - Auto-continue after configured delay (default 3s)
- Completion detection: Harness stops when
tracker.is_complete()returnstrue(onlyjson_checklistsupports this;notes_fileandnonerequire manual stop via Ctrl+C) - Press Ctrl+C to pause; run same command to resume
Error recovery:
Prevents runaway API costs when sessions fail repeatedly:
- Tracks consecutive errors across sessions
- Exponential backoff: 5s → 10s → 20s → 40s → 80s (circuit breaker trips; max cap 120s)
- Circuit breaker: Trips after 5 consecutive errors (configurable)
- Successful session resets error counter
- Error context forwarded to next session to help recovery
[error_recovery]
max_consecutive_errors = 5
initial_backoff_seconds = 5.0
max_backoff_seconds = 120.0
backoff_multiplier = 2.0This harness follows Anthropic's secure deployment recommendations by relying on the SDK's built-in sandbox and permission system as the primary defense, rather than custom application-layer validation.
The Claude SDK provides process-level isolation with:
- Process isolation — Bash commands run in a sandboxed subprocess
- Network restrictions — Configurable domain allowlist and Unix socket access
- Filesystem boundaries — Commands are restricted to the project directory
[security.sandbox]
enabled = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = false # secure default
[security.sandbox.network]
# Example allowed domains (configure for your project needs)
allowed_domains = ["registry.npmjs.org", "github.com"]
allow_local_binding = false
allow_unix_sockets = []Security is enforced through SDK permission rules, not runtime command parsing:
[security.permissions]
allow = [
"Bash(npm *)", "Bash(node *)", "Bash(git *)",
"Bash(ls *)", "Bash(cat *)", "Bash(grep *)",
"Read(./**)", "Write(./**)", "Edit(./**)",
]
deny = [
"Bash(curl *)", "Bash(wget *)",
"Read(./.env)", "Read(./.env.*)",
]Permission rules are evaluated by the SDK before tool execution. The agent cannot bypass these rules through prompt injection or indirect command execution.
allow_unsandboxed_commandsdefaults tofalse- When sandbox is enabled,
auto_allow_bash_if_sandboxed=trueauto-allows Bash commands - When sandbox is disabled, explicit
permissions.allowrules are required - Network access is denied by default
For production deployments, protect critical branches using server-side git hooks or branch protection rules on your git hosting platform (GitHub, GitLab, Bitbucket), not client-side validation. This prevents destructive operations like git push --force at the source.
Configuration lives in .agent-harness/config.toml.
project_dir/
├── .agent-harness/
│ ├── logs/ # Session logs (auto-created, gitignored)
│ ├── config.toml # Main configuration (required)
│ ├── spec.md # Project specification
│ ├── session.json # Session number, completed phases (auto-created)
│ └── prompts/ # Prompt files (referenced by config)
│ ├── init.md
│ └── build.md
└── (generated code lives here)
For a complete, annotated configuration reference with detailed comments on all available options, see:
agent_harness/templates/config.toml- Template with full documentationexamples/claude-ai-clone/.agent-harness/config.toml- Real-world example
The init command creates a new config using the template:
uv run python -m agent_harness init --project-dir ./my-projectCLI flags > config.toml values > defaults
claude-agent-harness/
├── agent_harness/ # Python package
│ ├── __init__.py
│ ├── __main__.py # Entry point
│ ├── cli.py # Argument parsing, subcommands
│ ├── client_factory.py # Builds ClaudeSDKClient from config
│ ├── config.py # Config loading, validation, HarnessConfig
│ ├── runner.py # Generic agent loop
│ ├── tracking.py # Progress tracking implementations
│ └── verify.py # Setup verification checks
├── examples/
│ ├── claude-ai-clone/
│ │ ├── .agent-harness/
│ │ │ ├── config.toml
│ │ │ ├── spec.md
│ │ │ └── prompts/
│ │ │ ├── init.md
│ │ │ └── build.md
│ │ └── README.md
│ └── simple-calculator/
│ ├── .agent-harness/
│ │ ├── config.toml
│ │ ├── spec.md
│ │ └── prompts/
│ │ ├── init.md
│ │ └── build.md
│ └── README.md
├── tests/
│ ├── test_cli.py
│ ├── test_client_factory.py
│ ├── test_config.py
│ ├── test_prompts.py
│ ├── test_runner.py
│ ├── test_tracking.py
│ └── test_verify.py
├── .env.example
└── pyproject.toml
See examples/claude-ai-clone/ for a complete example that:
- Uses Next.js/React stack (npm, node commands)
- Integrates Puppeteer MCP server for browser testing
- Generates a production-quality chat interface
- Tracks progress via feature_list.json
# Run the Claude.ai clone example
mkdir -p ./my-clone-output
cp -r examples/claude-ai-clone/.agent-harness ./my-clone-output/
uv run python -m agent_harness run --project-dir ./my-clone-outputSee examples/simple-calculator/ for a minimal example that:
- Uses Python stdlib only (no external dependencies)
- Completes in ~5 minutes (good for demos)
- Shows basic two-phase workflow (init + build)
- Tracks progress via
feature_list.json
The harness expects a .agent-harness/config.toml file in your project directory. If you see this error:
- Check that you're running from the correct directory
- Use
uv run python -m agent_harness init --project-dir ./my-projectto scaffold a new configuration
Check that all file: references in your config.toml point to files relative to the .agent-harness/ directory:
[[phases]]
prompt = "file:prompts/coding_prompt.md" # Must exist at .agent-harness/prompts/coding_prompt.mdYou need authentication credentials to use the Claude API:
- API Key: Get one from console.anthropic.com and set
export ANTHROPIC_API_KEY="your-key" - OAuth Token: Run
claude setup-tokenand the harness will useCLAUDE_CODE_OAUTH_TOKENautomatically - See
.env.examplefor setting these via environment file
The first session can take 10-20+ minutes for complex projects as it reads the spec, plans features, creates project structure, and sets up git. This is expected behavior. Subsequent sessions are typically faster.
If a session truly hangs:
- Check
.agent-harness/session.jsonfor error messages - Look for permission prompts or security blocks in the output
- Verify your progress file format matches the configuration (e.g.,
feature_list.jsonwith"passes": falsefields)
# Run all tests
uv run python -m unittest discover tests -v
# Run specific test modules
uv run python -m unittest tests.test_config -v # Configuration loading
uv run python -m unittest tests.test_tracking -v # Progress tracking
uv run python -m unittest tests.test_runner -v # Session loop logic
uv run python -m unittest tests.test_client_factory -v # Client creationTest coverage includes:
- Security configuration: Sandbox settings, permission rules, network isolation
- Configuration loading: TOML parsing, defaults, validation, error cases
- Progress tracking: Completion detection, JSON parsing, print formatting
- Prompt loading: File reading,
file:resolution, error handling
MIT License. See LICENSE.