SapienAIAgent

Kaggle AI Agent Capstone Project - Human Genomics AI Agent allowing technical enquiries, clinical research and mutation analysis \Here is a clean, concise, professional README.md written specifically for your Sapien AI multi-agent genomics pipeline notebook.

📘 README — SAPIEN AI: Multi-Agent Genomic Analysis Pipeline

This notebook implements SAPIEN AI, a fully automated multi-agent genomics analysis system designed for the Kaggle Agents Intensive Capstone (Google × Kaggle, Nov–Dec 2025). It transforms complex genomics workflows into a single conversational interface, powered by a coordinated team of domain-specific AI agents. It is structured for Kaggle, GitHub, educational, research and workshop practice purposes only!

⚠️ Important Disclaimer (Practice Capstone Project) This notebook is a practice capstone created for the Kaggle × Google 5-Day AI Agents Intensive. It is not validated on real clinical datasets, has not undergone any clinical QA, and is not intended for medical or diagnostic use. All variant analysis, ClinVar/VEP annotations, and gene metadata retrieval in this notebook are:

based on demo tools only,
using simplified mock or publicly accessible data,
designed for educational and prototyping purposes,
not tested or verified on real patient VCFs,
not reviewed under clinical pipelines or ISO standards.
Future versions may expand or improve the functionality, but this project must not be interpreted as a clinical workflow or used to make health-related decisions.

🚀 Project Overview

Genomic interpretation normally requires many separate steps:

VCF parsing
VEP annotation
ClinVar clinical significance lookup
Gene metadata retrieval
PubMed literature review
Final scientific report writing

This notebook unifies all of these into one conversation, by constructing a multi-agent architecture where each agent is responsible for a specific domain task and a Supervisor coordinates their execution.

The result is an end-to-end genomics intelligence system that produces a research-style Markdown report for any user question.

🧬 System Architecture

The system uses five fully defined agents:

Agent	Purpose
GeneExpert	Retrieves Ensembl gene metadata using `ensembl_gene_lookup`.
VariantAnalyst	Parses VCF files, runs VEP, ClinVar, and gene inference.
LiteratureExpert	Uses hybrid RAG to synthesize PubMed/S2 literature.
ChiefScientist	Produces the final, unified Markdown research report.
Supervisor	Executes the multi-step orchestration and performs Agent-to-Agent (A2A) routing.

🔒 Key Features

⭐ 1. True Multi-Agent Execution

Agents are not just LLM prompts—they are called programmatically using ADK’s delegate() API.

The Supervisor:

Detects the type of query
Decides which agents should run
Executes them in the correct order
Passes all outputs downstream (A2A)
Assembles a final report via ChiefScientist

⭐ 2. Full A2A (Agent-to-Agent) Data Passing

Variant analysis output → GeneExpert → LiteratureExpert → ChiefScientist.

All downstream modules receive upstream outputs to ensure a coherent final report.

⭐ 3. Research-Quality Markdown Output

ChiefScientist produces a multi-section genomic analysis report containing:

Genes Analyzed
Variant Tables (VEP + ClinVar)
Literature-Derived Insights
Summary & Disclaimer

Suitable for educational and research-only workflows.

⭐ 4. Safe Fallback Logic

If no VCF → VariantAnalyst is skipped
If no gene symbol → GeneExpert is skipped
LiteratureExpert always runs
ChiefScientist accepts empty blocks (no errors)

⭐ 5. Diagnostic Mode

A dedicated testing cell validates pipeline behavior, ensuring:

Correct routing
Correct A2A
No missing-context errors

🧪 Testing the Multi-Agent System

Use the provided A2A Diagnostic Suite to validate end-to-end functionality.

Example test cases:

liver disease
BRCA1 gene
my_sample.vcf
analyse sample.vcf
analyse sample.vcf with BRCA1

This confirms that:

VariantAnalyst runs only when appropriate
GeneExpert activates only on gene symbols
LiteratureExpert always runs
ChiefScientist receives VA / GENE / LIT blocks correctly

📁 Notebook Structure

Section	Description
Cell 1 – Environment Setup	Loads ADK, tools, keys, and supporting libraries.
Cell 2 – Tool Definitions	PubMed, Ensembl, VEP, ClinVar, RAG, etc.
Cell 3 – Multi-Agent System	Builds all agents + real execution Supervisor + App + Runner.
Cell 4 – Interactive Mode	Fully conversational genomics assistant.
Cell X – A2A Diagnostic Suite	Validates multi-agent routing & pipeline integration.

🛠 How to Use

1. Enter Interactive Mode

Once the notebook shows:

SAPIEN AI – INTERACTIVE MODE ACTIVATED

You can type questions such as:

common lung disease
analyse mydata.vcf
TP53 function
final report

2. The Supervisor chooses the correct agents:

VCF analysis
Gene metadata
Literature synthesis
or general biomedical reasoning

3. ChiefScientist returns the final scientific Markdown report.

📌 Limitations

This system is for education and research only. It is not a medical device and should not be used for clinical decisions.

Variant annotations rely on external databases and may not be fully complete or up to date.

📚 References

Ensembl REST API
ClinVar variation services
PubMed & Semantic Scholar
VEP (Variant Effect Predictor)
Google ADK (Agents Development Kit)

🎓 License

This project is released for educational and research use under the Kaggle Agents Intensive rules.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
sapienaiagent-submit-final.ipynb		sapienaiagent-submit-final.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SapienAIAgent

📘 README — SAPIEN AI: Multi-Agent Genomic Analysis Pipeline

🚀 Project Overview

🧬 System Architecture

The system uses five fully defined agents:

🔒 Key Features

⭐ 1. True Multi-Agent Execution

⭐ 2. Full A2A (Agent-to-Agent) Data Passing

⭐ 3. Research-Quality Markdown Output

⭐ 4. Safe Fallback Logic

⭐ 5. Diagnostic Mode

🧪 Testing the Multi-Agent System

📁 Notebook Structure

🛠 How to Use

1. Enter Interactive Mode

2. The Supervisor chooses the correct agents:

3. ChiefScientist returns the final scientific Markdown report.

📌 Limitations

📚 References

🎓 License

About

Uh oh!

Releases

Packages

Languages

pwwongaa/SapienAIAgent

Folders and files

Latest commit

History

Repository files navigation

SapienAIAgent

📘 README — SAPIEN AI: Multi-Agent Genomic Analysis Pipeline

🚀 Project Overview

🧬 System Architecture

The system uses five fully defined agents:

🔒 Key Features

⭐ 1. True Multi-Agent Execution

⭐ 2. Full A2A (Agent-to-Agent) Data Passing

⭐ 3. Research-Quality Markdown Output

⭐ 4. Safe Fallback Logic

⭐ 5. Diagnostic Mode

🧪 Testing the Multi-Agent System

📁 Notebook Structure

🛠 How to Use

1. Enter Interactive Mode

2. The Supervisor chooses the correct agents:

3. ChiefScientist returns the final scientific Markdown report.

📌 Limitations

📚 References

🎓 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages