GitHub - jcaperella29/Enrichment_Analysis_LLM_APP: An AI-powered web app for interpreting miRNA enrichment results, prioritizing biological drivers, and designing follow-up experiments.

jcaperella29 / Enrichment_Analysis_LLM_APP Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

An AI-powered web app for interpreting miRNA enrichment results, prioritizing biological drivers, and designing follow-up experiments.

MIT license

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
playbook		playbook
templates		templates
.env		.env
LICENSE		LICENSE
README.txt		README.txt
app.py		app.py
biofit.py		biofit.py
index_playbook.py		index_playbook.py
indexer.py		indexer.py
pipeline.py		pipeline.py
program_summarizer.py		program_summarizer.py
reasoner.py		reasoner.py
requirements.txt		requirements.txt
singularity.def		singularity.def
summarizer.py		summarizer.py
triage.py		triage.py

Repository files navigation

ENRICHMENT TRIAGE APP – NEED-TO-KNOW README
========================================

This app runs miRNA / pathway enrichment, feeds the result into an LLM,
and produces a structured biological interpretation + a PDF report.

It is designed to run locally, in Docker, or on HPC via Singularity.


----------------------------------------
1) WHAT YOU NEED
----------------------------------------

• OpenAI API key  
• Docker OR Apptainer (Singularity)  

Set these environment variables in the env:

OPENAI_API_KEY=sk-xxxxx
VECTOR_STORE_ID=

----------------------------------------
OpenAI API key & Vector Store setup
----------------------------------------

1) Get an OpenAI API key  
• Go to https://platform.openai.com  
• Create an account or log in  
• Click your profile → “View API keys”  
• Create a new key and copy it  

Set it in your environment:
    export OPENAI_API_KEY=sk-...
or put it in your .env file:
    OPENAI_API_KEY=sk-...


2) (Optional but recommended) Create a Vector Store for RAG

Vector stores let the model retrieve your playbook, rules, and priors.

• Go to https://platform.openai.com → Storage → Vector Stores  
• Create a new vector store  
• Upload your playbook / markdown files  
• Copy the Vector Store ID  

Set it in your environment:
    export VECTOR_STORE_ID=vs_...
or in .env:
    VECTOR_STORE_ID=vs_...

If no VECTOR_STORE_ID is set, the app still works, but runs in “no-RAG” mode
(using general biological reasoning only).


----------------------------------------
2) RUNNING WITH DOCKER
----------------------------------------

Build:
docker build -t enrichment-triage .

Run:
docker run -p 5000:5000 \
  --env-file .env \
  enrichment-triage

Open in browser:
http://localhost:5000


----------------------------------------
3) RUNNING WITH APPTAINER (HPC)
----------------------------------------

Build:
apptainer build enrichment-triage.sif singularity.def

Run (important: bind a writable reports folder):
mkdir -p reports
apptainer run \
  --bind $(pwd)/reports:/reports \
  --env-file .env \
  enrichment-triage.sif

Open:
http://localhost:5000

PDFs will appear in ./reports/



----------------------------------------
4) LONG-RUNNING JOBS
----------------------------------------

LLM reasoning can take several minutes for large datasets.

Gunicorn timeout is set in the Dockerfile:
--timeout 400

If you get timeouts:
• Increase this value
• Or use fewer workers (-w 1)



----------------------------------------
5) RUNNING AN ANALYSIS (THE UI FLOW)
----------------------------------------

1. In the left “Inputs” panel:
   • Upload your Enrichr-style CSV
   • Enter the Phenotype (what you care about biologically)
   • Select Assay, Organism, Tissue, Cell Type, Perturbation, Timepoint

2. Click the blue **Analyze** button.

3. While running:
   • The backend performs enrichment triage
   • The LLM reasons about programs, confounders, and biology
   • This may take 30 seconds to several minutes for large datasets

4. When finished:
   • The Results panel populates
   • The **Programs**, **Top Terms**, and **Raw JSON** tabs become active
   • The status badge shows **Ready ✓**


----------------------------------------
6) RESULTS & PDF GENERATION
----------------------------------------

The right-side **Results** panel contains four tabs:

• **Programs**
  Shows clustered biological programs and their scores

• **Top Terms**
  Shows the most enriched gene sets and pathways

• **Raw JSON**
  Full machine-readable output (for pipelines, notebooks, etc.)

• **PDF Report**
  Human-readable biological summary


To generate a PDF:

1. Click the **PDF Report** tab
2. Click **Generate PDF**
3. When finished, the status shows **Ready ✓**
4. A preview appears in the embedded PDF viewer
5. Click **Download PDF** to save the report


Where the PDF comes from:

• The PDF is built from:
  – Enrichment programs
  – Triage scores
  – LLM interpretation (drivers, reactive, confounders)
  – Follow-up experiments

• In Docker:
  – PDFs are stored inside the container under /app/static/reports

• In Singularity:
  – PDFs are written to /reports
  – You must bind a writable folder:
        --bind ./reports:/reports

----------------------------------------
7) WHAT THE LLM RETURNS
----------------------------------------

The LLM produces free-text analysis including:
• Likely drivers vs reactive vs artifacts
• Confounders
• Follow-up experiments

The PDF builder extracts:
• Programs
• Confounders
• Follow-ups
using keyword grouping — no fragile JSON schemas.


----------------------------------------
8) TROUBLESHOOTING
----------------------------------------

If you get 500 errors:
• Check OPENAI_API_KEY is set
• Check you bound /reports for Singularity
• Check Gunicorn timeout

If PDFs fail:
• You are probably on a read-only filesystem
• Bind a writable folder to /reports


----------------------------------------
9) THIS IS AN HPC APP
----------------------------------------

This was designed for:
• Slurm
• Apptainer
• Large gene sets
• Long-running LLM reasoning

It is not a toy web app.
Treat it like a scientific workflow service.