Skip to content

Kaggle AI Agent Capstone Project - Human Genomics AI Agent allowing technical enquiries, clinical research and mutation analysis

Notifications You must be signed in to change notification settings

pwwongaa/SapienAIAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

SapienAIAgent

Kaggle AI Agent Capstone Project - Human Genomics AI Agent allowing technical enquiries, clinical research and mutation analysis \Here is a clean, concise, professional README.md written specifically for your Sapien AI multi-agent genomics pipeline notebook.


📘 README — SAPIEN AI: Multi-Agent Genomic Analysis Pipeline

This notebook implements SAPIEN AI, a fully automated multi-agent genomics analysis system designed for the Kaggle Agents Intensive Capstone (Google × Kaggle, Nov–Dec 2025). It transforms complex genomics workflows into a single conversational interface, powered by a coordinated team of domain-specific AI agents. It is structured for Kaggle, GitHub, educational, research and workshop practice purposes only!


⚠️ Important Disclaimer (Practice Capstone Project) This notebook is a practice capstone created for the Kaggle × Google 5-Day AI Agents Intensive. It is not validated on real clinical datasets, has not undergone any clinical QA, and is not intended for medical or diagnostic use. All variant analysis, ClinVar/VEP annotations, and gene metadata retrieval in this notebook are:

  • based on demo tools only,
  • using simplified mock or publicly accessible data,
  • designed for educational and prototyping purposes,
  • not tested or verified on real patient VCFs,
  • not reviewed under clinical pipelines or ISO standards.
  • Future versions may expand or improve the functionality, but this project must not be interpreted as a clinical workflow or used to make health-related decisions.

🚀 Project Overview

Genomic interpretation normally requires many separate steps:

  • VCF parsing
  • VEP annotation
  • ClinVar clinical significance lookup
  • Gene metadata retrieval
  • PubMed literature review
  • Final scientific report writing

This notebook unifies all of these into one conversation, by constructing a multi-agent architecture where each agent is responsible for a specific domain task and a Supervisor coordinates their execution.

The result is an end-to-end genomics intelligence system that produces a research-style Markdown report for any user question.


🧬 System Architecture

The system uses five fully defined agents:

Agent Purpose
GeneExpert Retrieves Ensembl gene metadata using ensembl_gene_lookup.
VariantAnalyst Parses VCF files, runs VEP, ClinVar, and gene inference.
LiteratureExpert Uses hybrid RAG to synthesize PubMed/S2 literature.
ChiefScientist Produces the final, unified Markdown research report.
Supervisor Executes the multi-step orchestration and performs Agent-to-Agent (A2A) routing.

🔒 Key Features

1. True Multi-Agent Execution

Agents are not just LLM prompts—they are called programmatically using ADK’s delegate() API.

The Supervisor:

  • Detects the type of query
  • Decides which agents should run
  • Executes them in the correct order
  • Passes all outputs downstream (A2A)
  • Assembles a final report via ChiefScientist

2. Full A2A (Agent-to-Agent) Data Passing

Variant analysis output → GeneExpert → LiteratureExpert → ChiefScientist.

All downstream modules receive upstream outputs to ensure a coherent final report.

3. Research-Quality Markdown Output

ChiefScientist produces a multi-section genomic analysis report containing:

  • Genes Analyzed
  • Variant Tables (VEP + ClinVar)
  • Literature-Derived Insights
  • Summary & Disclaimer

Suitable for educational and research-only workflows.

4. Safe Fallback Logic

  • If no VCF → VariantAnalyst is skipped
  • If no gene symbol → GeneExpert is skipped
  • LiteratureExpert always runs
  • ChiefScientist accepts empty blocks (no errors)

5. Diagnostic Mode

A dedicated testing cell validates pipeline behavior, ensuring:

  • Correct routing
  • Correct A2A
  • No missing-context errors

🧪 Testing the Multi-Agent System

Use the provided A2A Diagnostic Suite to validate end-to-end functionality.

Example test cases:

liver disease
BRCA1 gene
my_sample.vcf
analyse sample.vcf
analyse sample.vcf with BRCA1

This confirms that:

  • VariantAnalyst runs only when appropriate
  • GeneExpert activates only on gene symbols
  • LiteratureExpert always runs
  • ChiefScientist receives VA / GENE / LIT blocks correctly

📁 Notebook Structure

Section Description
Cell 1 – Environment Setup Loads ADK, tools, keys, and supporting libraries.
Cell 2 – Tool Definitions PubMed, Ensembl, VEP, ClinVar, RAG, etc.
Cell 3 – Multi-Agent System Builds all agents + real execution Supervisor + App + Runner.
Cell 4 – Interactive Mode Fully conversational genomics assistant.
Cell X – A2A Diagnostic Suite Validates multi-agent routing & pipeline integration.

🛠 How to Use

1. Enter Interactive Mode

Once the notebook shows:

SAPIEN AI – INTERACTIVE MODE ACTIVATED

You can type questions such as:

common lung disease
analyse mydata.vcf
TP53 function
final report

2. The Supervisor chooses the correct agents:

  • VCF analysis
  • Gene metadata
  • Literature synthesis
  • or general biomedical reasoning

3. ChiefScientist returns the final scientific Markdown report.


📌 Limitations

This system is for education and research only. It is not a medical device and should not be used for clinical decisions.

Variant annotations rely on external databases and may not be fully complete or up to date.


📚 References

  • Ensembl REST API
  • ClinVar variation services
  • PubMed & Semantic Scholar
  • VEP (Variant Effect Predictor)
  • Google ADK (Agents Development Kit)

🎓 License

This project is released for educational and research use under the Kaggle Agents Intensive rules.

About

Kaggle AI Agent Capstone Project - Human Genomics AI Agent allowing technical enquiries, clinical research and mutation analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published