A comprehensive observability solution for monitoring Claude Code usage, performance, and costs. This setup implements the recommendations from the Claude Code Observability Documentation to provide deep insights into AI-assisted development workflows.
The main operations dashboard with comprehensive visibility into sessions, costs, tool usage, performance, and real-time event logs.
Sections: Overview stats, Cost & Usage Analysis, Tool Usage & Performance, Performance & Errors, User Activity & Productivity, Event Logs
Executive cockpit view with hero stats, activity timelines, tool breakdown, code velocity, and cost intelligence.
Sections: Hero Stats, Activity Timeline, What Claude Did, Cost Intelligence, Live Activity
Deep-dive into token consumption patterns, model distribution, session analysis, and cache efficiency metrics.
Sections: Overview, Token Usage Over Time, Model Analysis, Session Analysis, Cache Intelligence
- Cost Analysis: Track usage costs by model, user, and time periods
- User Analytics: Daily/Weekly/Monthly Active Users (DAU/WAU/MAU)
- Tool Usage: Monitor which Claude Code tools are used most frequently
- Performance Metrics: API latency, success rates, and bottleneck identification
- Productivity Insights: Lines of code changes, commits, and pull requests
- API Request Tracking: Monitor actual request counts by model version
- Token Efficiency: Track cost-per-token across different models
- Session Analytics: Comprehensive session and productivity tracking
- Real-time Monitoring: Live dashboards with 30-second refresh rates
- Executive Overview: High-level KPIs and trends
- Cost Management: Detailed cost breakdowns and projections
- Tool Performance: Success rates and execution times
- User Activity: Productivity and engagement metrics
- Error Analysis: Comprehensive error tracking and investigation
Claude Code β OpenTelemetry Collector β Prometheus (metrics) + Loki (events/logs)
β
Grafana (visualization & analysis)
| Service | Purpose | Port | UI |
|---|---|---|---|
| OpenTelemetry Collector | Metrics/logs ingestion | 4317 (gRPC), 4318 (HTTP) | - |
| Prometheus | Metrics storage & querying | 9090 | http://localhost:9090 |
| Loki | Log aggregation & storage | 3100 | - |
| Grafana | Dashboards & visualization | 3000 | http://localhost:3000 |
# Start all services
make up
# Check status
make status# Enable telemetry
export CLAUDE_CODE_ENABLE_TELEMETRY=1
# Configure exporters
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# For debugging (faster export intervals)
export OTEL_METRIC_EXPORT_INTERVAL=10000
export OTEL_LOGS_EXPORT_INTERVAL=5000
# Run Claude Code
claude- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
πΌοΈ Visual Guide: Check out the Dashboard Screenshots to see what your dashboards will look like!
Based on the Claude Code Observability Documentation, this stack monitors:
claude_code.session.count- CLI sessions startedclaude_code.lines_of_code.count- Lines of code modified (added/removed)claude_code.pull_request.count- Pull requests createdclaude_code.commit.count- Git commits createdclaude_code.cost.usage- Cost of sessions by modelclaude_code.token.usage- Token usage (input/output/cache/creation)claude_code.code_edit_tool.decision- Tool permission decisions
claude_code.user_prompt- User prompt submissionsclaude_code.tool_result- Tool execution results and timingsclaude_code.api_request- API requests with duration and tokensclaude_code.api_error- API errors with status codesclaude_code.tool_decision- Tool permission decisions
Access comprehensive analytics through the Grafana dashboard at http://localhost:3000:
- Cost Analysis: Real-time cost tracking with model breakdowns
- Request Monitoring: API request counts and patterns by model
- Token Efficiency: Track token usage and cost-per-token metrics
- Tool Performance: Success rates and execution time analysis
- Session Analytics: User activity and productivity insights
- Total and per-model costs with trending
- API request counts independent of cost variations
- Token usage breakdown (input/output/cache/creation)
- Tool usage patterns and success rates
- Session activity and code productivity metrics
π‘ See Dashboard Screenshots above for visual examples
- Cost by Model: Track spending across different Claude models
- API Request Tracking: Monitor actual request counts by model version
- Token Usage Breakdown: Detailed analysis by token type (input/output/cache)
- Usage Patterns: Most frequently used Claude Code tools
- Success Rates: Tool execution success percentages
- Performance Metrics: Average execution times and bottleneck identification
- Live Metrics: 30-second refresh rate for current activity
- Session Tracking: Active sessions and productivity metrics
- Error Analysis: API errors and troubleshooting information
Three specialized dashboards are included for different analysis needs:
The main operations dashboard for day-to-day monitoring:
- Overview: Active sessions, cost, token usage, lines of code
- Cost & Usage Analysis: Cost trends by model, token usage breakdown, API request tracking
- Tool Usage & Performance: Tool frequency, success rates, cumulative usage
- Performance & Errors: API latency by model, error rate tracking
- User Activity & Productivity: Code changes, commits, pull requests
- Event Logs: Real-time tool execution events and API errors
Executive cockpit for productivity insights:
- Hero Stats: Today's spend, tokens used, lines changed, tool calls, cache efficiency
- Activity Timeline: Cost and token usage over time with model breakdown
- What Claude Did: Top tools used, code velocity (lines added/removed)
- Cost Intelligence: Spending by model over time, token breakdown, cache savings
- Live Activity: Recent tool executions and errors
Deep-dive analysis for token optimization:
- Overview: Total tokens, token rate, cache efficiency, estimated cost
- Token Usage Over Time: Rate by type, cumulative usage trends
- Model Analysis: Tokens by model over time, model distribution pie chart
- Session Analysis: Top sessions by token usage, active sessions over time
- Cache Intelligence: Cache efficiency over time, cache savings estimate
Key configuration options (see CLAUDE_OBSERVABILITY.md for complete reference):
# Core telemetry
CLAUDE_CODE_ENABLE_TELEMETRY=1
# Exporter configuration
OTEL_METRICS_EXPORTER=otlp,prometheus # Multiple exporters
OTEL_LOGS_EXPORTER=otlp
# Protocol and endpoints
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer token"
# Export intervals
OTEL_METRIC_EXPORT_INTERVAL=60000 # 1 minute (production)
OTEL_LOGS_EXPORT_INTERVAL=5000 # 5 seconds
# Privacy controls
OTEL_LOG_USER_PROMPTS=1 # Enable prompt content logging
# Cardinality control
OTEL_METRICS_INCLUDE_SESSION_ID=true
OTEL_METRICS_INCLUDE_VERSION=false
OTEL_METRICS_INCLUDE_ACCOUNT_UUID=trueThe OpenTelemetry collector is configured with:
- Processors: Resource enrichment and event filtering
- Multiple Pipelines: Separate routing for metrics and different event types
- Metric Relabeling: Cardinality control for better performance
Following the documentation recommendations:
- Metrics Backend: Prometheus (time series) + optional columnar stores
- Events Backend: Loki (log aggregation) with JSON parsing
- Cardinality Management: Configurable attribute inclusion
- Retention: Configure based on your analysis needs
# Stack management
make up # Start all services
make down # Stop all services
make restart # Restart services
make clean # Clean up containers and volumes
# Monitoring
make logs # View all logs
make logs-collector # View collector logs only
make status # Show service status
# Validation
make validate-config # Validate all configs
make setup-claude # Show Claude Code setup instructions- Cost Management: Track AI assistance costs by team/project
- Productivity Measurement: Quantify development velocity improvements
- Tool Adoption: Understand which Claude Code features drive value
- Performance Optimization: Identify and resolve usage bottlenecks
- Capacity Planning: Predict infrastructure needs based on usage growth
- SLA Monitoring: Track API performance and availability
- Security: Monitor unusual usage patterns
- Resource Optimization: Optimize token usage and reduce costs
- ROI Analysis: Measure productivity gains from AI assistance
- Usage Insights: Understand adoption patterns across teams
- Cost Control: Monitor and optimize AI assistance spending
- Strategic Planning: Data-driven decisions on AI tool investments
- User Privacy: Prompt content logging is disabled by default
- Data Isolation: All data stays within your infrastructure
- Access Control: Configure Grafana authentication as needed
- Audit Trail: Complete logging of all tool usage and decisions
- Claude Code Observability Documentation - Complete reference
- OpenTelemetry Documentation - OTel specification
- Prometheus Documentation - Metrics and alerting
- Grafana Documentation - Dashboards and visualization
- Loki Documentation - Log aggregation
This observability stack implements the patterns and recommendations from the official Claude Code documentation. To contribute:
- Follow the metric naming conventions in the documentation
- Update dashboards to reflect new data sources and metrics
- Test configurations before submitting changes
- Ensure all sensitive information is excluded from commits
- Update documentation for any new features or configuration changes
This project is licensed under the MIT License - see the LICENSE file for details.
- Built following the Claude Code Observability Documentation
- Uses OpenTelemetry standards for metrics and events
- Implements industry best practices for observability stack architecture


