Agent Harness

A generic, configurable harness for long-running autonomous coding agents. Built on the Claude Agent SDK, it implements Anthropic's guide for effective agent harnesses, featuring phase-driven execution, configurable MCP tools, and SDK-native sandbox isolation.

Features

Agent Harness provides:

Phase-based workflows — declarative phase definitions with conditions and run-once semantics
TOML-based configuration — no code changes needed to customize behavior
SDK-native security — OS sandbox with network isolation, declarative permission rules (allow/deny), secure defaults
Progress tracking — JSON checklist, notes file, or none with automatic completion detection
Error recovery — exponential backoff and circuit breaker to prevent runaway costs
MCP server support — browser automation, databases, etc.
Session persistence — auto-continue across sessions with state tracking
Setup verification — check auth, tools, config before running

Getting Started

1. Clone and install

git clone <repo-url>
cd claude-agent-harness
uv sync

Alternative using pip

python3 -m venv .venv && source .venv/bin/activate
pip install -e .

If using pip, replace uv run with python in all commands below.

2. Set up authentication

Export one of these environment variables:

ANTHROPIC_API_KEY — get one from console.anthropic.com
CLAUDE_CODE_OAUTH_TOKEN — via claude setup-token

See .env.example for all options.

Using 1Password CLI

If you manage secrets with 1Password CLI, create a .env file with an op:// reference:

ANTHROPIC_API_KEY="op://Vault/Item/api_key"

Then wrap any command with op run:

op run --env-file "./.env" -- uv run python -m agent_harness run --project-dir ./my-project

3. Run

# Scaffold a new project configuration
uv run python -m agent_harness init --project-dir ./my-project

# Edit the configuration
#    -> ./my-project/.agent-harness/config.toml

# Verify setup
uv run python -m agent_harness verify --project-dir ./my-project

# Run the agent
uv run python -m agent_harness run --project-dir ./my-project

CLI Reference

# Run the agent
uv run python -m agent_harness run --project-dir <path> [options]

# Verify setup (auth, dependencies, config)
uv run python -m agent_harness verify [--project-dir <path>]

# Scaffold new project configuration
uv run python -m agent_harness init --project-dir <path>

# Global flags (all commands)
--project-dir PATH      # Agent's working directory (default: .)

# Run command options
--max-iterations N      # Override max iterations (default: from config)
--model MODEL           # Override model (default: from config)

How It Works

The harness executes agents in configurable phases with conditions and run-once semantics. Each phase gets a fresh Claude SDK session (no context carryover) with a configured prompt.

Phase execution:

Phases run sequentially based on conditions (exists:, not_exists: path checks)
run_once: true phases skip after first successful completion
State persists in .agent-harness/session.json

Session management:

Fresh context per session prevents context pollution
Progress preserved via tracking file (e.g., feature_list.json), session state, and git commits
Auto-continue after configured delay (default 3s)
Completion detection: Harness stops when tracker.is_complete() returns true (only json_checklist supports this; notes_file and none require manual stop via Ctrl+C)
Press Ctrl+C to pause; run same command to resume

Error recovery:

Prevents runaway API costs when sessions fail repeatedly:

Tracks consecutive errors across sessions
Exponential backoff: 5s → 10s → 20s → 40s → 80s (circuit breaker trips; max cap 120s)
Circuit breaker: Trips after 5 consecutive errors (configurable)
Successful session resets error counter
Error context forwarded to next session to help recovery

[error_recovery]
max_consecutive_errors = 5
initial_backoff_seconds = 5.0
max_backoff_seconds = 120.0
backoff_multiplier = 2.0

Security Model

This harness follows Anthropic's secure deployment recommendations by relying on the SDK's built-in sandbox and permission system as the primary defense, rather than custom application-layer validation.

SDK-Native Sandbox

The Claude SDK provides process-level isolation with:

Process isolation — Bash commands run in a sandboxed subprocess
Network restrictions — Configurable domain allowlist and Unix socket access
Filesystem boundaries — Commands are restricted to the project directory

[security.sandbox]
enabled = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = false  # secure default

[security.sandbox.network]
# Example allowed domains (configure for your project needs)
allowed_domains = ["registry.npmjs.org", "github.com"]
allow_local_binding = false
allow_unix_sockets = []

Declarative Permission Rules

Security is enforced through SDK permission rules, not runtime command parsing:

[security.permissions]
allow = [
    "Bash(npm *)", "Bash(node *)", "Bash(git *)",
    "Bash(ls *)", "Bash(cat *)", "Bash(grep *)",
    "Read(./**)", "Write(./**)", "Edit(./**)",
]
deny = [
    "Bash(curl *)", "Bash(wget *)",
    "Read(./.env)", "Read(./.env.*)",
]

Permission rules are evaluated by the SDK before tool execution. The agent cannot bypass these rules through prompt injection or indirect command execution.

Secure Defaults

allow_unsandboxed_commands defaults to false
When sandbox is enabled, auto_allow_bash_if_sandboxed=true auto-allows Bash commands
When sandbox is disabled, explicit permissions.allow rules are required
Network access is denied by default

Git Protection Recommendations

For production deployments, protect critical branches using server-side git hooks or branch protection rules on your git hosting platform (GitHub, GitLab, Bitbucket), not client-side validation. This prevents destructive operations like git push --force at the source.

Configuration

Configuration lives in .agent-harness/config.toml.

Directory Layout

project_dir/
├── .agent-harness/
│   ├── logs/                  # Session logs (auto-created, gitignored)
│   ├── config.toml            # Main configuration (required)
│   ├── spec.md                # Project specification
│   ├── session.json           # Session number, completed phases (auto-created)
│   └── prompts/               # Prompt files (referenced by config)
│       ├── init.md
│       └── build.md
└── (generated code lives here)

Configuration Reference

For a complete, annotated configuration reference with detailed comments on all available options, see:

agent_harness/templates/config.toml - Template with full documentation
examples/claude-ai-clone/.agent-harness/config.toml - Real-world example

The init command creates a new config using the template:

uv run python -m agent_harness init --project-dir ./my-project

Config Loading Precedence

CLI flags > config.toml values > defaults

Project Structure

claude-agent-harness/
├── agent_harness/          # Python package
│   ├── __init__.py
│   ├── __main__.py         # Entry point
│   ├── cli.py              # Argument parsing, subcommands
│   ├── client_factory.py   # Builds ClaudeSDKClient from config
│   ├── config.py           # Config loading, validation, HarnessConfig
│   ├── runner.py           # Generic agent loop
│   ├── tracking.py         # Progress tracking implementations
│   └── verify.py           # Setup verification checks
├── examples/
│   ├── claude-ai-clone/
│   │   ├── .agent-harness/
│   │   │   ├── config.toml
│   │   │   ├── spec.md
│   │   │   └── prompts/
│   │   │       ├── init.md
│   │   │       └── build.md
│   │   └── README.md
│   └── simple-calculator/
│       ├── .agent-harness/
│       │   ├── config.toml
│       │   ├── spec.md
│       │   └── prompts/
│       │       ├── init.md
│       │       └── build.md
│       └── README.md
├── tests/
│   ├── test_cli.py
│   ├── test_client_factory.py
│   ├── test_config.py
│   ├── test_prompts.py
│   ├── test_runner.py
│   ├── test_tracking.py
│   └── test_verify.py
├── .env.example
└── pyproject.toml

Examples

Claude.ai Clone (Next.js)

See examples/claude-ai-clone/ for a complete example that:

Uses Next.js/React stack (npm, node commands)
Integrates Puppeteer MCP server for browser testing
Generates a production-quality chat interface
Tracks progress via feature_list.json

# Run the Claude.ai clone example
mkdir -p ./my-clone-output
cp -r examples/claude-ai-clone/.agent-harness ./my-clone-output/
uv run python -m agent_harness run --project-dir ./my-clone-output

Simple Calculator (Python)

See examples/simple-calculator/ for a minimal example that:

Uses Python stdlib only (no external dependencies)
Completes in ~5 minutes (good for demos)
Shows basic two-phase workflow (init + build)
Tracks progress via feature_list.json

Troubleshooting

"Configuration file not found"

The harness expects a .agent-harness/config.toml file in your project directory. If you see this error:

Check that you're running from the correct directory
Use uv run python -m agent_harness init --project-dir ./my-project to scaffold a new configuration

"Prompt file not found"

Check that all file: references in your config.toml point to files relative to the .agent-harness/ directory:

[[phases]]
prompt = "file:prompts/coding_prompt.md"  # Must exist at .agent-harness/prompts/coding_prompt.md

"Neither ANTHROPIC_API_KEY nor CLAUDE_CODE_OAUTH_TOKEN is set"

You need authentication credentials to use the Claude API:

API Key: Get one from console.anthropic.com and set export ANTHROPIC_API_KEY="your-key"
OAuth Token: Run claude setup-token and the harness will use CLAUDE_CODE_OAUTH_TOKEN automatically
See .env.example for setting these via environment file

Agent is hanging on the first session

The first session can take 10-20+ minutes for complex projects as it reads the spec, plans features, creates project structure, and sets up git. This is expected behavior. Subsequent sessions are typically faster.

If a session truly hangs:

Check .agent-harness/session.json for error messages
Look for permission prompts or security blocks in the output
Verify your progress file format matches the configuration (e.g., feature_list.json with "passes": false fields)

Running Tests

# Run all tests
uv run python -m unittest discover tests -v

# Run specific test modules
uv run python -m unittest tests.test_config -v         # Configuration loading
uv run python -m unittest tests.test_tracking -v       # Progress tracking
uv run python -m unittest tests.test_runner -v         # Session loop logic
uv run python -m unittest tests.test_client_factory -v # Client creation

Test coverage includes:

Security configuration: Sandbox settings, permission rules, network isolation
Configuration loading: TOML parsing, defaults, validation, error cases
Progress tracking: Completion detection, JSON parsing, print formatting
Prompt loading: File reading, file: resolution, error handling

License

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
agent_harness		agent_harness
examples		examples
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Harness

Features

Getting Started

1. Clone and install

2. Set up authentication

3. Run

CLI Reference

How It Works

Security Model

SDK-Native Sandbox

Declarative Permission Rules

Secure Defaults

Git Protection Recommendations

Configuration

Directory Layout

Configuration Reference

Config Loading Precedence

Project Structure

Examples

Claude.ai Clone (Next.js)

Simple Calculator (Python)

Troubleshooting

"Configuration file not found"

"Prompt file not found"

"Neither ANTHROPIC_API_KEY nor CLAUDE_CODE_OAUTH_TOKEN is set"

Agent is hanging on the first session

Running Tests

License

About

Uh oh!

Releases

Packages

Languages

License

cpplain/claude-agent-harness

Folders and files

Latest commit

History

Repository files navigation

Agent Harness

Features

Getting Started

1. Clone and install

2. Set up authentication

3. Run

CLI Reference

How It Works

Security Model

SDK-Native Sandbox

Declarative Permission Rules

Secure Defaults

Git Protection Recommendations

Configuration

Directory Layout

Configuration Reference

Config Loading Precedence

Project Structure

Examples

Claude.ai Clone (Next.js)

Simple Calculator (Python)

Troubleshooting

"Configuration file not found"

"Prompt file not found"

"Neither ANTHROPIC_API_KEY nor CLAUDE_CODE_OAUTH_TOKEN is set"

Agent is hanging on the first session

Running Tests

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages