A multi-agent demonstration showcasing AI agents as first-class Kubernetes citizens that collaborate to manage, monitor, and troubleshoot cluster workloads.
Transform Kubernetes operations from manual troubleshooting into collaborative agent workflows. Instead of running kubectl commands and reading logs yourself, delegate to specialized agents that can discover problems, analyze source code, correlate errors, and propose solutions autonomously.
Jane is experimenting with Kagenti as a potential agent management platform. She notices that a pod is periodically crashing, causing delays due to frequent restarts. Jane opens Claude Code and asks for help troubleshooting the cluster.
What happens:
- Claude discovers running agents in the
kagenti-system-agent-teamnamespace that expose skills related to monitoring and debugging - Claude delegates to the
k8s-monitoring-agentto fetch logs and sees stack traces referencing a particular class in Kagenti's codebase - Claude sees that a
source-code-specialist-agentexists, so it forwards the stack trace and asks about the area of the codebase where it originated - Using the codebase analysis, Claude generates a hypothesis about where the problem originates and proposes solutions
The magic: All of this happens through agent-to-agent communication using the A2A protocol, with agents discovering and invoking each other's skills dynamically.
┌──────────────────────────────────────────────────────────────┐
│ Supervisor (Claude Code/IDE or in-cluster Agent) │
│ • Orchestrates workflow │
│ • Has access to local files and git │
│ • Discovers cluster agents via MCP │
└────────────────┬─────────────────────────────────────────────┘
│
│ MCP Protocol
│
┌────────────────▼─────────────────────────────────────────────┐
│ A2A-to-MCP Bridge │
│ │
│ • Discovers agents via AgentCard CRDs │
│ • Translates MCP <-> A2A protocols │
│ • Can run locally OR as agent in cluster │
└────────────────┬─────────────────────────────────────────────┘
│
│ Kubernetes API (read AgentCards)
│
┌────────────────▼─────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ AgentCard Controller │ │
│ │ (kagenti-operator/agentcard_controller.go) │ │
│ │ • Watches Agent Pods with appropriate labels │ │
│ │ • Periodically fetches /.well-known/agent.json │ │
│ │ • Caches agent capabilities in AgentCard CRs │ │
│ │ • Makes agents discoverable cluster-wide │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ Namespace: kagenti-system-agent-team │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ k8s-monitoring │ │ source-code │ │
│ │ Agent Pod │ │ Agent Pod │ ... │
│ │ │ │ │ │
│ │ Skills: │ │ Skills: │ │
│ │ • get_pod_logs │ │ • search_repo │ │
│ │ • get_events │ │ • analyze_trace │ │
│ │ • get_metrics │ │ • git_blame │ │
│ └──────────────────┘ └──────────────────┘ │
│ ▲ ▲ │
│ │ │ │
│ ┌──────┴──────────────────────┴───────────────────────────┐ │
│ │ AgentCard CRs (created by controller) │ │
│ │ • k8s-monitoring-agent.kagenti-system │ │
│ │ • source-code-agent.kagenti-system │ │
│ │ (contain cached agent capabilities) │ │
│ └─────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
Location: Here
What it does:
- Defines a Kubernetes CR that caches an agent's capabilities
- Syncs from agent's
/.well-known/agent.jsonendpoint - Stores A2A-compliant agent cards in cluster
- Makes agents discoverable via kubectl/API
Status: Core CRD and controllers implemented with tests
What it provides:
- Discovers agents via AgentCard CRD
- Allows invocation of discovered agents
What it enables:
- A supervisor can discover and use agents in a cluster to solve problems.
What they provide:
- Each agent exposes some skills, and has access to tools, which enable some aspect of cluster management: reading logs, listing crds, etc...
- Together with a
supervisoror theA2A <-> Kagenti Bridgethey act as a unified control plane.
- Discovery abstraction: MCP tools model maps cleanly to agent skills
- Deployment flexibility: Bridge can run locally or in-cluster
- Future-proof: Can swap out supervisor without changing agents
- Scalability: Different agents can run on different nodes
- Separation of concerns: Clear skill boundaries
- Context Hygiene: Federating tools across agents reduces context pollution
- Reusability: Same monitoring agent works for multiple use cases
- Security: Can grant minimal RBAC per agent role
The demo is successful when:
- Jane can ask the Supervisor: "Why is my pod crashing?"
- Supervisor automatically discovers k8s-monitoring-agent and source-code-agent
- Supervisor fetches logs from monitoring agent
- Supervisor asks source-code agent to analyze the stacktrace
- Supervisor synthesizes findings and proposes a fix
- All of this happens through A2A agent collaboration, visible to the user
This is a proof-of-concept demo. To contribute:
- Check/file issues to discuss approach before major work
- Test against real Kagenti deployments
- Document your specialist agent's skills clearly in a directory scoped README.md
Thank you for your contributions!!