This repository contains a Clinical Decision Support System (CDSS) designed to assist healthcare professionals in the diagnosis and treatment of Pulmonary Embolism (PE). The system integrates Large Language Models Agents with Retrieval-Augmented Generation to provide evidence-based clinical asssitance grounded in official medical guidelines.
The system is built using an agent-based architecture that combines autonomous reasoning, dynamic information retrieval, and decision-making capabilities to interpret clinical guidelines and generate personalized recommendations for patient care.
- Allows healthcare professionals to query official PE clinical guidelines
- Provides evidence-based answers with source citations
- Supports complex clinical reasoning and interpretation
- Includes hallucination detection to ensure response accuracy
- Integrates structured patient data for personalized recommendations
- Calculates clinical risk metrics (PESI, sPESI, Risk of Early Mortality)
- Generates personalized diagnostic and treatment recommendations
- Considers patient-specific contraindications and clinical context
- Agent-Based System: Multi-agent workflow using LangGraph
- RAG Integration: Retrieval-Augmented Generation for grounded responses
- Hallucination Detection: Built-in verification system for medical safety
- Dynamic Retrieval: Intelligent information retrieval based on clinical context
- Structured Workflows: Step-by-step clinical reasoning process
βββ app.py # Main application entry point
βββ CDSS_demo.ipynb # Interactive Jupyter notebook demo
βββ graph.png # Visual representation of the CDSS graph
βββ .env.example # Example environment variables configuration
βββ requirements.txt # Python dependencies
βββ README.md # This file
β
βββ data/ # Clinical data and guidelines
β βββ clinical_cases/ # Patient Database
β β βββ clinical_cases.xlsx # Patient case datasets
β β βββ pe_scores_gt.xlsx # Ground truth PE scores
β β
β βββ medical_guidelines/ # PE Clinical Guidelines
β βββ processed_markdown/ # Processed guideline documents
β βββ raw/ # Original PDF guidelines
β
βββ src/ # Source code
β βββ graph_compilation.py # LangGraph graph compilation
β βββ llm_config.py # LLM configuration
β β
β βββ custom_config/ # Custom configurations
β β βββ state_schema.py # State management schema
β β βββ custom_messages.py # Custom message types
β β βββ routing_functions.py # Workflow routing logic
β β
β βββ nodes/ # Agent nodes
β β βββ common_nodes.py # Intial and shared nodes
β β βββ guidelines_consultation_nodes.py # Guidelines query agents
β β βββ clinical_case_evaluation_nodes.py # Clinical evaluation agents
β β βββ metrics_calculation_nodes.py # PESI/sPESI calculation
β β
β βββ services/ # Core services
β βββ retrieval.py # Document retrieval
β βββ re_ranking.py # Result re-ranking
β βββ ingestion_functions.py # Document processing
β βββ hallucination_detector.py # Response verification
β βββ tools.py # LangGraph tools
β
βββ ingestion/ # Document ingestion pipeline
β βββ Ingestion.ipynb # Document processing notebook
β
βββ vectorstores/ # Vector databases
β βββ pe_protocol/ # PE guidelines vectorstore
β
βββ experimental_results/ # Evaluation results and datasets
βββ clinical_case_evaluation_results/ # Clinical case evaluation results
β βββ evaluation_by_patient/ # Individual patient evaluation
β βββ clinical_case_evaluation_dataset.json # Dataset with the CDSS final responses
β βββ experts_evaluation_results.ods # Expert validation results
β
βββ guidelines_consultation_results/ # Guidelines consultation results
βββ evaluation_questions/ # Test questions
βββ questions_results_by_difficulty/ # Results by question difficulty
βββ questions_results_dataset.json # Dataset with the CDSS final responses
βββ evaluation_score_results.ods # Performance scores
- Python 3.11 or higher
- OpenAI API key
-
Clone the repository:
git clone <repository-url>
-
Create and activate a virtual environment:
- Using
venv:
python -m venv <environment-name> # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
- Using
conda:
# Create the environment conda create -n <environment-name> python=3.11 # Activate the environment conda activate <environment-name>
- Using
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
Create a
.envfile in the root directory:OPENAI_API_KEY=your_openai_api_key_here
cd <path-to-repository>
python app.pyjupyter notebook CDSS_demo.ipynb- Patient Selection: Choose a patient from the clinical cases dataset (Patients 1-20)
- Service Selection:
- Guidelines Consultation: Ask questions about PE clinical guidelines
- Clinical Case Evaluation: Ask to get personalized recommendations for the selected patient
- Interactive Process: The system will guide you through the clinical reasoning process
- Results: Receive evidence-based recommendations with source citations
- "What are the diagnostic criteria for pulmonary embolism?"
- "When should thrombolytic therapy be considered?"
- "What are the contraindications for anticoagulation?"
- "Help me with this clinical case"
- "How should I proceed with the selected patient?"
- "Analyze this patient"
The system includes a complete pipeline for ingesting new medical documents and creating vector stores.
-
Open the ingestion notebook:
jupyter notebook ingestion/Ingestion.ipynb
-
PDF to Markdown Conversion:
- Uses Docling for accurate medical document conversion
- Preserves document structure and formatting
- Handles tables, figures, and references
-
Document Processing:
- Intelligent chunking using markdown headers
- Metadata extraction and cross-reference handling
- Duplicate content removal
-
Vector Store Creation:
- Uses OpenAI embeddings (text-embedding-3-large)
- Stores in Chroma vector database
- Supports semantic search and retrieval
# 1. Convert PDF to Markdown
convert_pdf_to_markdown(pdf_path, markdown_path)
# 2. Process markdown files
markdown_documents = read_markdown_files(markdown_folder)
# 3. Create vector store
vectorstore = Chroma.from_documents(
documents=processed_chunks,
embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
collection_name="your_collection_name",
persist_directory="vectorstores/your_directory"
)The CDSS system is built as a sophisticated multi-agent workflow using LangGraph. Below is the complete system architecture showing all nodes and their relationships:
- START: System initialization point
- Patient Processor: Loads patient data from Excel files, handles patient selection (1-20), and initializes patient state
- Query Input: Manages user input and query processing, handles interactive communication with healthcare professionals
- Orchestrator: Central routing agent that classifies queries and directs them to appropriate service pathways
- Query Solver: Main agent for processing clinical guideline queries, performs multi-step reasoning and coordinates information retrieval
- Dynamic Retrieval Tool: Retrieves relevant information from PE guidelines vectorstore, supports semantic search and re-ranking
-
PESI Parameters Evaluator: Identifies required parameters for PESI/sPESI calculation, checks for missing patient data
-
PESI Patient Data Request Tool: Requests missing patient data specifically for PESI calculation
-
PESI Calculator: Computes PESI (Pulmonary Embolism Severity Index) and sPESI (Simplified PESI) scores for mortality risk assessment
-
ROEM Parameters Evaluator: Identifies parameters needed for Risk of Early Mortality calculation
-
ROEM Patient Data Request Tool: Requests missing patient data for mortality risk assessment
-
ROEM Calculator: Calculates comprehensive Risk of Early Mortality classification
- Clinical Case Evaluator: Multi-turn reasoning agent that:
- Analyzes patient data and calculated metrics
- Performs clinical reasoning across multiple steps
- Coordinates information retrieval and data requests
- Prepares for final recommendation generation
- Patient Data Request Tool: General tool for requesting missing patient parameters during clinical evaluation
- Dynamic Retrieval Tool 2: Secondary retrieval tool for guidelines information during clinical evaluation
- Clinical Case Report Generator: Generates final clinical recommendations with:
- Patient state assessment
- Diagnosis determination
- Diagnostic/Treatment recommendations
- Safety considerations and contraindications
- Finish Session: Handles session termination and cleanup
- Initialization: START β Patient Processor β Query Input β Orchestrator
- Guidelines Consultation: Orchestrator β Query Solver β Dynamic Retrieval Tool β Query Input
- Clinical Evaluation: Orchestrator β PESI Parameters Evaluator β PESI Calculator β ROEM Parameters Evaluator β ROEM Calculator β Clinical Case Evaluator β Clinical Case Report Generator β Query Input
- Data Requests: Any evaluator can route to data request tools when patient information is missing
- Information Retrieval: Clinical Case Evaluator can call Dynamic Retrieval Tool 2 for additional guideline information
- Session End: Any point β Finish Session
- Conditional Routing: Smart routing based on agent decisions and patient data availability
- Memory Persistence: LangGraph checkpoint system maintains conversation state
- Uncertainty Handling: Robust handling of missing data and failed operations
- Multi-turn Reasoning: Agents can perform multiple reasoning steps before generating final outputs
- Safety Checks: Built-in verification and hallucination detection at multiple stages
- Hallucination Detection: Prevents unsafe or inaccurate recommendations
- Contraindication Checking: Automatically considers patient-specific contraindications
- Evidence-Based Responses: All recommendations are grounded in official guidelines
- Transparent Reasoning: Provides step-by-step clinical reasoning
- Source Citation: All responses include references to source documents
The system has been evaluated using:
- Clinical Case Evaluation Dataset: 20 simulated patient cases
- Guidelines Consultation Dataset: Clinical questions with three difficulty levels
- Expert Validation: Clinical expert review of recommendations
- Automated Metrics: Context precision, relevance, and factual accuracy
This system is part of a research study: "Reasoning Clinical Decision Support System with Large Language Model Agents: A Case Study in Pulmonary Embolism"
The research demonstrates the effectiveness of agent-based LLM systems in clinical decision support, with particular focus on pulmonary embolism management.
For questions or support, please contact the research team or open an issue in this repository.
