A course by DeepLearning.AI in collaboration with Guardrails AI
As AI applications become more powerful, ensuring that they are safe, reliable, and aligned with user needs is critical. Large Language Models (LLMs) can hallucinate, leak sensitive data, or drift off-topic — which can lead to unsafe or unreliable AI systems.
This course introduces Guardrails, a framework for adding structured reliability and safety checks to AI applications. Through hands-on labs and guided projects, you’ll learn how to detect, prevent, and correct failures in Retrieval-Augmented Generation (RAG) systems and chatbots.
By the end of this course, you will be able to:
- Identify common failure modes in RAG and chatbot applications.
- Understand what guardrails are and how they enforce safe AI behavior.
- Build and configure your first guardrail in a real-world application.
- Detect and mitigate hallucinations in model outputs.
- Keep an AI on-topic, ensuring relevance and reliability.
- Prevent leakage of Personal Identifiable Information (PII).
- Ensure compliance by avoiding competitor mentions and other sensitive outputs.
Intuition:
RAG systems enhance LLMs by retrieving documents before generating answers. But they often fail by:
- Hallucination: model fabricates information.
- Irrelevant retrieval: irrelevant docs harm accuracy.
- Information leakage: sensitive/private data is exposed.
- Off-topic responses: model drifts away from user intent.
Example:
A legal chatbot citing fake case laws → unsafe deployment.
Goal: Learn to spot and categorize these failures before they occur.
Definition:
Guardrails are rules, constraints, and validation checks that enforce safe and reliable AI outputs. They can:
- Reject invalid outputs.
- Enforce formatting (e.g., JSON schema).
- Filter sensitive or disallowed content.
- Monitor behavior continuously.
Analogy: Guardrails on a highway keep cars from veering off the road → AI guardrails keep LLMs aligned and safe.
Hands-on focus:
- Installing and configuring Guardrails AI.
- Defining a schema or rule (e.g., answer must be a factual sentence).
- Validating model outputs automatically.
Example:
Ensure chatbot always answers in JSON format with keys "answer" and "source".
Intuition:
- Hallucination = model states something unsupported by retrieved docs.
- Use Natural Language Inference (NLI) to compare model output with context documents:
- Entailment: supported → ✅
- Contradiction / Neutral: hallucination risk → ❌
Practice Example:
- Retrieved doc: "Paris is the capital of France."
- Model output: "Paris is the capital of Germany." → Guardrail flags hallucination.
Application:
Integrate hallucination guardrail into a chatbot loop:
- User asks a question.
- RAG retrieves documents.
- Model generates an answer.
- Guardrail checks output against docs.
- If hallucination detected → chatbot responds with fallback (e.g., “I don’t know”).
Result: More reliable chatbot, less false confidence.
Problem:
LLMs tend to drift into chit-chat or irrelevant topics.
Solution:
- Topic classification guardrail.
- Only allow responses if they match the allowed domain (e.g., medical chatbot → only health-related topics).
Example:
- User: “Tell me a joke.”
- Chatbot: “I’m here to answer medical questions. Please ask about symptoms or treatments.”
Intuition:
AI should never reveal or generate sensitive data like phone numbers, SSNs, credit cards, or addresses.
Guardrail approach:
- Regex-based and ML-based detection of PII.
- Block or mask outputs containing PII.
Example:
- Output: “John’s SSN is 123-45-6789.” → Blocked.
- Output: “Personal data cannot be shared.” → ✅ safe.
Motivation:
In business settings, AI assistants should not recommend or promote competitors.
Guardrail setup:
- Define a deny-list of competitor names.
- Scan outputs and replace/block if detected.
Example:
- User: “Which platforms are better than X?”
- Guardrail prevents mention of competitors → chatbot politely declines.
- Guardrails AI Documentation - Official guide to building guardrails.
- DeepLearning.AI Course Page - More AI safety & reliability courses.
- Treat each guardrail like a safety net, not a patch. Design with reliability in mind from the start.
- Test with adversarial prompts to see how your chatbot behaves under stress.
- Don’t over-restrict — balance safety with usability.
- Explore combining multiple guardrails (hallucination + topic control + PII) for layered safety.