Phase 0 Prototype: Core framework demonstrating intelligent reasoning optimization and cost reduction
Deep Agent One is a Phase 0 prototype deep agent framework built on LangGraph DeepAgents with Gemini 3 Pro as the primary model and GPT-5.1 as automatic fallback. This prototype demonstrates the core framework capabilities with basic features.
What This Prototype Provides:
- Dual-Model Stack: Gemini 3 Pro (primary) with GPT-5.1 automatic fallback on errors
- Prompt Optimization: Opik integration with 6 optimization algorithms (Hierarchical, Bayesian, Evolutionary, MetaPrompt, GEPA, Parameter)
- Human-in-the-Loop: Built-in approval workflows for critical decisions (Phase 0 stub)
- Real-time Transparency: AG-UI protocol for streaming events and tool call visibility
- Web Search: Perplexity MCP integration for web queries
- File Operations: Read, write, edit, and list files
- Observability: LangSmith tracing for debugging and monitoring
Phase 0 Status:
This is a working prototype demonstrating core framework capabilities. Future phases will add production-grade features like memory systems, user authentication, deep research, OCR, and RAG.
Chat Interface
- Real-time streaming responses with token-by-token display
- Conversation history and context tracking
- WebSocket-based bidirectional communication
Dual-Model Architecture
- Primary: Google Gemini 3 Pro with configurable thinking depth (low/medium/high)
- Fallback: OpenAI GPT-5.1 with automatic failover on rate limits, timeouts, or errors
- Intelligent model routing with seamless switching (Phase 0 stub)
- Per-request cost tracking and model attribution
Opik Prompt Optimization (Available)
- 6 optimization algorithms: Hierarchical, Bayesian, Evolutionary, MetaPrompt, GEPA, Parameter
- Prompt analysis against GPT best practices
- A/B testing infrastructure for prompt comparison
- Automatic evaluation dataset generation
Reasoning Effort Routing (Phase 0 Stub)
- Infrastructure for 4 effort levels: minimal, low, medium, high
- Phase 0: Fixed at HIGH effort (dynamic routing coming in Phase 1)
- Configurable timeouts per effort level
- Trigger phrase detection (Phase 1)
Web Search
- Perplexity MCP integration for web queries
- Search results with source citations
- Seamless integration into agent responses
Human-in-the-Loop (HITL)
- Built-in approval workflow for critical operations
- Three response options: Accept, Respond (with feedback), Edit
- Real-time approval requests in UI
- Conversation continuity after approval/rejection
File System Tools
ls: List directory contentsread_file: Read file contentswrite_file: Create or overwrite filesedit_file: Modify existing files
Planning & Task Tracking
- TodoWrite-based planning tool
- Real-time task list updates in UI
- Subtask progress visualization
Agent Transparency
- Real-time tool call display (args and results)
- Agent state visualization
- Step-by-step execution tracking
- "Inspect source" toggle for detailed tool information
Monitoring & Observability
- LangSmith tracing for all agent operations
- Structured logging (JSON in production, human-readable locally)
- Error tracking with trace IDs
- Cost tracking per request
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (Next.js) β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββββββββ β
β β Chat UI β β HITL β β Reasoning β β
β β (AG-UI) β β Approval β β Dashboard β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ βββββββββββ¬ββββββββββ β
β β β β β
β ββββββββββWebSocketβββββββββ΄ββββββββββββ β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββ
β BACKEND (FastAPI) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β API Layer (/api/v1/) β β
β β β’ Agents β’ Chat β’ Reasoning β’ WebSocket Streaming β β
β ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ β
β β Services Layer β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β Agent β β LLM β β Reasoning β β β
β β β Service β β Factory β β Router β β β
β β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β β
β βββββββββββΌββββββββββββββββββΌββββββββββββββββββΌβββββββββββ β
β β β β β
β βββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββ β
β β LangGraph DeepAgents β β
β β β’ Planning Tool β’ File Tools β’ Sub-Agents β’ HITL β β
β ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ β
β β External Integrations β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β
β β β Google β β OpenAI β βPerplexityβ βLangSmith β β β
β β β Gemini β β GPT-5.1 β β MCP β β Traces β β β
β β β(Primary) β β(Fallback)β β β β β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ
β DATA LAYER (Phase 0) β
β ββββββββββββββββββββ β
β β SQLite β Checkpointer for agent state β
β β Checkpointer β (conversation history, HITL state) β
β ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Primary model (Gemini 3 Pro) with automatic failover to GPT-5.1:
- Rate limit handling: Seamless switch on 429 errors
- Timeout recovery: Fallback on slow responses
- Connection resilience: Retry with alternate model
- Transparent switching: User sees consistent experience
Infrastructure for dynamic routing (full implementation in Phase 1):
- Minimal/Low/Medium/High effort levels defined
- Phase 0: Fixed at HIGH (stub implementation)
- Phase 1: Trigger phrase detection + complexity analysis
Real-time streaming of agent events to frontend:
- Lifecycle events (RunStarted, StepStarted, etc.)
- Text message streaming (token-by-token)
- Tool call transparency (args, results)
- HITL approval workflows
Human-in-the-loop approvals with three options:
- Accept: Approve without changes
- Respond: Reject with feedback
- Edit: Approve with modifications
State persistence across agent interactions:
- Phase 0: SQLite for local development
- Enables conversation continuity and HITL workflows
Backend:
- Framework: Langchain DeepAgents
- LLM: Google Gemini 3 Pro (primary) + OpenAI GPT-5.1 (fallback)
- Prompt Optimization: Opik (6 algorithms)
- API: FastAPI (async Python 3.10+)
- State Management: SQLite checkpointer
- Monitoring: LangSmith
Frontend:
- Framework: Next.js 14+ (App Router)
- UI: shadcn/ui + Tailwind CSS
- State: Zustand
- Protocol: AG-UI for event streaming
- WebSocket: Real-time bidirectional communication
External Integrations:
- Google Gemini: Primary LLM (Gemini 3 Pro)
- OpenAI: Fallback LLM (GPT-5.1)
- Opik: Prompt optimization and A/B testing
- Perplexity MCP: Web search
- LangSmith: Agent tracing and observability
- Python 3.10+
- Node.js 18+
- Git
- API Keys: Google (Gemini), OpenAI (fallback), Perplexity, LangSmith (optional)
# 1. Clone repository
git clone https://github.com/yourusername/deep-agent-one.git
cd deep-agent-one
# 2. Install Poetry (Python dependency manager)
curl -sSL https://install.python-poetry.org | python3 -
# 3. Install Python dependencies
poetry install
# 4. Install Node.js dependencies
npm install
# 5. Set up environment variables
cp .env.example .env
# Edit .env with your API keys:
# - GOOGLE_API_KEY (required - Gemini 3 Pro, primary model)
# - OPENAI_API_KEY (required - GPT-5.1 fallback)
# - PERPLEXITY_API_KEY (required for web search)
# - LANGSMITH_API_KEY (optional, for tracing)
# 6. Start the servers
./scripts/start-all.shAccess Points:
- Backend API: http://127.0.0.1:8000
- Frontend UI: http://localhost:3000
- API Docs: http://127.0.0.1:8000/docs
This prototype is optimized for deployment on Replit:
One-Click Deploy:
- Fork this repository on Replit
- Configure Secrets in Replit dashboard:
GOOGLE_API_KEY(Gemini 3 Pro - primary)OPENAI_API_KEY(GPT-5.1 - fallback)PERPLEXITY_API_KEYLANGSMITH_API_KEY(optional)
- Click "Run" - Replit automatically:
- Installs Python/Node dependencies
- Starts backend (FastAPI) on port 8000
- Starts frontend (Next.js) on port 3000
Environment Configuration:
# Replit automatically loads secrets from the Secrets tab
# No .env file needed on Replit
# Verify deployment:
curl https://your-repl-name.replit.app/healthRequired:
GOOGLE_API_KEY=... # Google API key for Gemini 3 Pro (primary)
OPENAI_API_KEY=sk-... # OpenAI API key for GPT-5.1 (fallback)
PERPLEXITY_API_KEY=pplx-... # Perplexity API key for web search
LANGSMITH_API_KEY=ls__... # LangSmith tracing (recommended)See .env.example for complete configuration template.
Starting a Conversation:
- Navigate to http://localhost:3000 (or your Replit URL)
- Type your message in the chat input
- Press Enter or click Send
- Watch the agent respond in real-time (token-by-token streaming)
Understanding the Model Stack:
Deep Agent One uses a dual-model architecture:
- Primary: Google Gemini 3 Pro with configurable thinking depth
- Fallback: OpenAI GPT-5.1 (automatic on errors/rate limits)
The model automatically switches if the primary model fails, ensuring reliable responses via Langchain middleware.
Reasoning Effort (Phase 0):
Note: In Phase 0, reasoning effort is fixed at MEDIUM. Dynamic routing based on query complexity will be added in Phase 1.
Infrastructure exists for 4 effort levels:
- Minimal (β‘): Fast, simple queries (Phase 1)
- Low (π‘): Basic explanations (Phase 1)
- Medium (π§ ): Current default for all queries
- High (π): Complex analysis with trigger phrases (Phase 1)
Coming in Phase 1:
- Trigger phrase detection ("think harder about...")
- Query complexity analysis
- Dynamic effort routing
When the agent needs approval:
-
Approval Request Appears: Agent pauses and displays the action requiring approval
-
Three Options:
- Accept: Proceed with the action as-is
- Respond: Reject and provide feedback (agent will adjust)
- Edit: Modify the action then approve
-
Response: Agent continues based on your decision
Example HITL Scenario:
Agent: "I need approval to delete file: important-data.txt"
Options:
[ Accept ]
[ Respond ]
[ Edit ]
You: [Respond] "Please back it up first"
Agent: "I'll create a backup at important-data.txt.bak first"
Tool Call Transparency:
Click "Inspect" next to any tool call response to see:
- Tool name and invocation time
- Input arguments (JSON format)
- Output results (Phase 1)
- Execution duration (Phase 1)
Example Tool Call Display:
Tool: web_search
Args: { "query": "latest Python news" }
Result: [5 search results with titles and URLs]
Duration: 1.2s
Task Progress:
Watch the subtask list (left sidebar) for:
- Current task being executed
- Completed tasks (checked off)
- Pending tasks (queued)
Per-Request Costs:
Each response shows:
- Tokens used (input + output)
- Reasoning effort level
- Estimated cost in USD
Cost Optimization Tips:
- Use simple queries when possible (triggers minimal reasoning)
- Avoid trigger phrases unless needed
- Break complex queries into simpler sub-queries
Phase 0 Limits:
- Per IP: 60 requests/minute
- WebSocket connections: 5 concurrent per IP
- Message size: 100KB max
Rate Limit Headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1634567890
Rate Limit Exceeded Response:
{
"error": "Rate limit exceeded",
"retry_after": 60
}| Query Type | Reasoning Effort | Expected Latency | Example |
|---|---|---|---|
| Simple | Minimal | TBDs | "What is 2+2?" |
| Basic | Low | TBDs | "What is Python?" |
| Moderate | Medium | TBDs | "Explain OOP" |
| Complex | High | TBDs | "Analyze quantum computing" |
Note: Response times include network latency, LLM inference, and streaming overhead.
Prototype Constraints:
- No memory system: Each conversation is independent (no long-term memory)
- No authentication: Open access (secure your deployment)
- No rate limiting per user: Only per-IP limits
- SQLite checkpointer: Not suitable for high concurrency
- No multi-user: Designed for single-user testing
- Reasoning fixed at HIGH: Dynamic effort routing coming in Phase 1
- Opik not integrated: Prompt optimization tools available but not in agent runtime
Model costs vary:
- Gemini 3 Pro (Primary): Generally lower cost per token
- GPT-5.1 (Fallback): Used only on primary model failures
Cost factors:
- Query complexity
- Response length
- Tool usage (web search adds $0.01-0.05)
- Model used (Gemini vs GPT-5.1)
Cost Optimization:
- Gemini 3 Pro is preferred for cost efficiency
- Cache common queries (Phase 1)
- Use specific, targeted questions
Rate Limiting:
- Per-IP request limits (60/minute)
- Concurrent WebSocket connection limits (5/IP)
- Protection against DoS attacks
Input Validation:
- Message size limits (10KB max)
- JSON schema validation
- SQL injection prevention (parameterized queries)
Security Headers:
- CORS whitelist (environment-specific)
- X-Content-Type-Options: nosniff
- X-Frame-Options: DENY
Phase 0 Prototype:
- No persistent user data storage
- Conversation state in SQLite (local only)
- LangSmith traces (optional, can be disabled)
Data Handling:
- API keys stored in environment variables only
- No hardcoded secrets in codebase
- Logs do not contain API keys or sensitive data
For Public Deployments:
- Add authentication: Implement API key or OAuth before public deployment
- Use HTTPS: Always deploy with SSL/TLS certificates
- Firewall rules: Restrict access to known IPs if possible
- Monitor logs: Watch for suspicious activity patterns
- Rotate API keys: Change OpenAI/Perplexity keys regularly
Environment Variables:
# Never commit these to version control
OPENAI_API_KEY=...
PERPLEXITY_API_KEY=...
LANGSMITH_API_KEY=...This is a Phase 0 prototype. Contributions are welcome for:
- Bug fixes
- Documentation improvements
- Test coverage
- Performance optimizations
Development Setup:
See CLAUDE.md for complete development guide, including:
- TDD workflow (write tests first)
- Commit discipline (semantic commits)
- Pre-commit checks (linting, type checking)
- Security scanning (TheAuditor)
Quick Start for Contributors:
# Fork and clone
git clone https://github.com/yourusername/deep-agent-one.git
# Install development dependencies
poetry install
npm install
# Run tests
pytest --cov
# Make changes, write tests, commit
git commit -m "feat(phase-0): add feature X"Apache 2.0 License - see LICENSE for details
Phase 0 Prototype - Demonstrating core framework capabilities with dual-model architecture and prompt optimization.
Built with Claude Code using LangGraph DeepAgents, Gemini 3 Pro, GPT-5.1, and Opik.
Roadmap:
- Phase 1: Production features (memory, auth, dynamic reasoning, Opik integration)
- Phase 2: Deep research capabilities and infrastructure hardening