This is the repository for the LinkedIn Learning course Operating AI Agents: Failure and Recovery. The full course is available from LinkedIn Learning.
As AI agents shift from experimentation to production, operational failures can create serious business risks. This intermediate course explores practical techniques for monitoring agent behavior, tracing execution paths, and identifying failure modes across single‑ and multi‑agent systems. Through hands-on GitHub Codespaces exercises, you learn how to implement rollback mechanisms, build automated recovery workflows, and create reports that surface agent health and system status in real time. By the end of the course, you’ll have the skills to improve the safety and predictability of AI agents in production, and to respond quickly and effectively when failures occur.
See the readme file in the main branch for updated instructions and information.
You’ll learn how to:
- Detect and diagnose AI agent failures in production using monitoring, logging, and execution‑tracing techniques.
- Analyze execution logs and system state to identify a failure, attribute the action to a specific agent and operation, and determine its scope and impact by comparing pre‑ and post‑action states.
- Implement rollback and other recovery mechanisms that restore a known‑good system state after unintended or destructive agent actions.
- Evaluate recovery success by validating restored state, confirming data integrity, and reviewing post‑recovery logs.
- Build automated recovery workflows and operational reports that surface agent health, failures, and recovery actions in real time.
- This course, Operating AI Agents: Failure and Recovery, is the second course in the governing AI agents series. The first course is Governing AI Agents: Visibility and Control.
- Python 3.9+
- An OpenAI API key
- Clone this repo (or download the files).
- Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
- Set your OpenAI API key or place in .env file:
export OPENAI_API_KEY="your_api_key" # macOS/Linux setx OPENAI_API_KEY "your_api_key" # Windows PowerShell
Kesha Williams
Award-Winning Tech Innovator and AI/ML Leader