Skip to content

πŸ₯ Clinical Decision Support System for Pulmonary Embolism - An agent-based AI system using LLMs and RAG to assist healthcare professionals with evidence-based PE diagnosis and treatment recommendations.

Notifications You must be signed in to change notification settings

ToniG14/Reasoning_CDSS

Repository files navigation

Reasoning Clinical Decision Support System for Pulmonary Embolism

Python LangGraph

Overview

This repository contains a Clinical Decision Support System (CDSS) designed to assist healthcare professionals in the diagnosis and treatment of Pulmonary Embolism (PE). The system integrates Large Language Models Agents with Retrieval-Augmented Generation to provide evidence-based clinical asssitance grounded in official medical guidelines.

The system is built using an agent-based architecture that combines autonomous reasoning, dynamic information retrieval, and decision-making capabilities to interpret clinical guidelines and generate personalized recommendations for patient care.

Key Features

πŸ” Two Primary Services

1. Guidelines Consultation Service

  • Allows healthcare professionals to query official PE clinical guidelines
  • Provides evidence-based answers with source citations
  • Supports complex clinical reasoning and interpretation
  • Includes hallucination detection to ensure response accuracy

2. Clinical Case Evaluation Service

  • Integrates structured patient data for personalized recommendations
  • Calculates clinical risk metrics (PESI, sPESI, Risk of Early Mortality)
  • Generates personalized diagnostic and treatment recommendations
  • Considers patient-specific contraindications and clinical context

πŸ—οΈ Advanced Architecture

  • Agent-Based System: Multi-agent workflow using LangGraph
  • RAG Integration: Retrieval-Augmented Generation for grounded responses
  • Hallucination Detection: Built-in verification system for medical safety
  • Dynamic Retrieval: Intelligent information retrieval based on clinical context
  • Structured Workflows: Step-by-step clinical reasoning process

Repository Structure

β”œβ”€β”€ app.py                                  # Main application entry point
β”œβ”€β”€ CDSS_demo.ipynb                         # Interactive Jupyter notebook demo
β”œβ”€β”€ graph.png                               # Visual representation of the CDSS graph
β”œβ”€β”€ .env.example                            # Example environment variables configuration
β”œβ”€β”€ requirements.txt                        # Python dependencies
β”œβ”€β”€ README.md                               # This file
β”‚
β”œβ”€β”€ data/                                   # Clinical data and guidelines
β”‚   β”œβ”€β”€ clinical_cases/                         # Patient Database
β”‚   β”‚   β”œβ”€β”€ clinical_cases.xlsx                     # Patient case datasets
β”‚   β”‚   └── pe_scores_gt.xlsx                       # Ground truth PE scores
β”‚   β”‚
β”‚   └── medical_guidelines/                     # PE Clinical Guidelines
β”‚       β”œβ”€β”€ processed_markdown/                     # Processed guideline documents
β”‚       └── raw/                                    # Original PDF guidelines
β”‚
β”œβ”€β”€ src/                                    # Source code
β”‚   β”œβ”€β”€ graph_compilation.py                    # LangGraph graph compilation
β”‚   β”œβ”€β”€ llm_config.py                           # LLM configuration
β”‚   β”‚
β”‚   β”œβ”€β”€ custom_config/                          # Custom configurations
β”‚   β”‚   β”œβ”€β”€ state_schema.py                         # State management schema
β”‚   β”‚   β”œβ”€β”€ custom_messages.py                      # Custom message types
β”‚   β”‚   └── routing_functions.py                    # Workflow routing logic
β”‚   β”‚
β”‚   β”œβ”€β”€ nodes/                                  # Agent nodes
β”‚   β”‚   β”œβ”€β”€ common_nodes.py                         # Intial and shared nodes
β”‚   β”‚   β”œβ”€β”€ guidelines_consultation_nodes.py        # Guidelines query agents
β”‚   β”‚   β”œβ”€β”€ clinical_case_evaluation_nodes.py       # Clinical evaluation agents
β”‚   β”‚   └── metrics_calculation_nodes.py            # PESI/sPESI calculation
β”‚   β”‚
β”‚   └── services/                               # Core services
β”‚       β”œβ”€β”€ retrieval.py                            # Document retrieval
β”‚       β”œβ”€β”€ re_ranking.py                           # Result re-ranking
β”‚       β”œβ”€β”€ ingestion_functions.py                  # Document processing
β”‚       β”œβ”€β”€ hallucination_detector.py               # Response verification
β”‚       └── tools.py                                # LangGraph tools
β”‚
β”œβ”€β”€ ingestion/                              # Document ingestion pipeline
β”‚   └── Ingestion.ipynb                         # Document processing notebook
β”‚
β”œβ”€β”€ vectorstores/                           # Vector databases
β”‚   └── pe_protocol/                            # PE guidelines vectorstore
β”‚
└── experimental_results/                   # Evaluation results and datasets
    β”œβ”€β”€ clinical_case_evaluation_results/       # Clinical case evaluation results
    β”‚   β”œβ”€β”€ evaluation_by_patient/                  # Individual patient evaluation
    β”‚   β”œβ”€β”€ clinical_case_evaluation_dataset.json   # Dataset with the CDSS final responses
    β”‚   └── experts_evaluation_results.ods          # Expert validation results
    β”‚
    └── guidelines_consultation_results/        # Guidelines consultation results
        β”œβ”€β”€ evaluation_questions/                   # Test questions
        β”œβ”€β”€ questions_results_by_difficulty/        # Results by question difficulty
        β”œβ”€β”€ questions_results_dataset.json          # Dataset with the CDSS final responses
        └── evaluation_score_results.ods            # Performance scores

Installation

Prerequisites

  • Python 3.11 or higher
  • OpenAI API key

Setup Instructions

  1. Clone the repository:

    git clone <repository-url>
  2. Create and activate a virtual environment:

    • Using venv:
    python -m venv <environment-name>
    
    # On Windows
    venv\Scripts\activate
    
    # On macOS/Linux
    source venv/bin/activate
    • Using conda:
    # Create the environment
    conda create -n <environment-name> python=3.11
    
    # Activate the environment
    conda activate <environment-name>
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up environment variables:

    Create a .env file in the root directory:

    OPENAI_API_KEY=your_openai_api_key_here

Usage

Running the Application

Option 1: Command Line Interface

cd <path-to-repository>
python app.py

Option 2: Interactive Jupyter Notebook

jupyter notebook CDSS_demo.ipynb

System Workflow

  1. Patient Selection: Choose a patient from the clinical cases dataset (Patients 1-20)
  2. Service Selection:
    • Guidelines Consultation: Ask questions about PE clinical guidelines
    • Clinical Case Evaluation: Ask to get personalized recommendations for the selected patient
  3. Interactive Process: The system will guide you through the clinical reasoning process
  4. Results: Receive evidence-based recommendations with source citations

Example Queries

Guidelines Consultation

  • "What are the diagnostic criteria for pulmonary embolism?"
  • "When should thrombolytic therapy be considered?"
  • "What are the contraindications for anticoagulation?"

Clinical Case Evaluation

  • "Help me with this clinical case"
  • "How should I proceed with the selected patient?"
  • "Analyze this patient"

Document Ingestion

The system includes a complete pipeline for ingesting new medical documents and creating vector stores.

Using the Ingestion Pipeline

  1. Open the ingestion notebook:

    jupyter notebook ingestion/Ingestion.ipynb
  2. PDF to Markdown Conversion:

    • Uses Docling for accurate medical document conversion
    • Preserves document structure and formatting
    • Handles tables, figures, and references
  3. Document Processing:

    • Intelligent chunking using markdown headers
    • Metadata extraction and cross-reference handling
    • Duplicate content removal
  4. Vector Store Creation:

    • Uses OpenAI embeddings (text-embedding-3-large)
    • Stores in Chroma vector database
    • Supports semantic search and retrieval

Ingestion Steps

# 1. Convert PDF to Markdown
convert_pdf_to_markdown(pdf_path, markdown_path)

# 2. Process markdown files
markdown_documents = read_markdown_files(markdown_folder)

# 3. Create vector store
vectorstore = Chroma.from_documents(
    documents=processed_chunks,
    embedding=OpenAIEmbeddings(model="text-embedding-3-large"),
    collection_name="your_collection_name",
    persist_directory="vectorstores/your_directory"
)

System Architecture: Complete Node Graph

The CDSS system is built as a sophisticated multi-agent workflow using LangGraph. Below is the complete system architecture showing all nodes and their relationships:

System Graph

Complete Node Breakdown

🏁 Entry Point Nodes

  • START: System initialization point
  • Patient Processor: Loads patient data from Excel files, handles patient selection (1-20), and initializes patient state
  • Query Input: Manages user input and query processing, handles interactive communication with healthcare professionals
  • Orchestrator: Central routing agent that classifies queries and directs them to appropriate service pathways

πŸ“‹ Guidelines Consultation Pathway

  • Query Solver: Main agent for processing clinical guideline queries, performs multi-step reasoning and coordinates information retrieval
  • Dynamic Retrieval Tool: Retrieves relevant information from PE guidelines vectorstore, supports semantic search and re-ranking

πŸ₯ Clinical Case Evaluation Pathway

Clinical Metrics Calculation
  • PESI Parameters Evaluator: Identifies required parameters for PESI/sPESI calculation, checks for missing patient data

  • PESI Patient Data Request Tool: Requests missing patient data specifically for PESI calculation

  • PESI Calculator: Computes PESI (Pulmonary Embolism Severity Index) and sPESI (Simplified PESI) scores for mortality risk assessment

  • ROEM Parameters Evaluator: Identifies parameters needed for Risk of Early Mortality calculation

  • ROEM Patient Data Request Tool: Requests missing patient data for mortality risk assessment

  • ROEM Calculator: Calculates comprehensive Risk of Early Mortality classification

Clinical Decision Making
  • Clinical Case Evaluator: Multi-turn reasoning agent that:
    • Analyzes patient data and calculated metrics
    • Performs clinical reasoning across multiple steps
    • Coordinates information retrieval and data requests
    • Prepares for final recommendation generation
  • Patient Data Request Tool: General tool for requesting missing patient parameters during clinical evaluation
  • Dynamic Retrieval Tool 2: Secondary retrieval tool for guidelines information during clinical evaluation
  • Clinical Case Report Generator: Generates final clinical recommendations with:
    • Patient state assessment
    • Diagnosis determination
    • Diagnostic/Treatment recommendations
    • Safety considerations and contraindications

πŸ”š Session Management

  • Finish Session: Handles session termination and cleanup

Node Interaction Flow

  1. Initialization: START β†’ Patient Processor β†’ Query Input β†’ Orchestrator
  2. Guidelines Consultation: Orchestrator β†’ Query Solver ⇄ Dynamic Retrieval Tool β†’ Query Input
  3. Clinical Evaluation: Orchestrator β†’ PESI Parameters Evaluator β†’ PESI Calculator β†’ ROEM Parameters Evaluator β†’ ROEM Calculator β†’ Clinical Case Evaluator β†’ Clinical Case Report Generator β†’ Query Input
  4. Data Requests: Any evaluator can route to data request tools when patient information is missing
  5. Information Retrieval: Clinical Case Evaluator can call Dynamic Retrieval Tool 2 for additional guideline information
  6. Session End: Any point β†’ Finish Session

Key Features of the Architecture

  • Conditional Routing: Smart routing based on agent decisions and patient data availability
  • Memory Persistence: LangGraph checkpoint system maintains conversation state
  • Uncertainty Handling: Robust handling of missing data and failed operations
  • Multi-turn Reasoning: Agents can perform multiple reasoning steps before generating final outputs
  • Safety Checks: Built-in verification and hallucination detection at multiple stages

Safety Features

  • Hallucination Detection: Prevents unsafe or inaccurate recommendations
  • Contraindication Checking: Automatically considers patient-specific contraindications
  • Evidence-Based Responses: All recommendations are grounded in official guidelines
  • Transparent Reasoning: Provides step-by-step clinical reasoning
  • Source Citation: All responses include references to source documents

Evaluation

The system has been evaluated using:

  • Clinical Case Evaluation Dataset: 20 simulated patient cases
  • Guidelines Consultation Dataset: Clinical questions with three difficulty levels
  • Expert Validation: Clinical expert review of recommendations
  • Automated Metrics: Context precision, relevance, and factual accuracy

Research Publication

This system is part of a research study: "Reasoning Clinical Decision Support System with Large Language Model Agents: A Case Study in Pulmonary Embolism"

The research demonstrates the effectiveness of agent-based LLM systems in clinical decision support, with particular focus on pulmonary embolism management.

Contact

For questions or support, please contact the research team or open an issue in this repository.


⚠️ Important Medical Disclaimer: This system is designed for research purposes and to assist healthcare professionals. It should not replace professional medical judgment or be used as the sole basis for clinical decisions. Always consult with qualified healthcare providers for patient care decisions.

About

πŸ₯ Clinical Decision Support System for Pulmonary Embolism - An agent-based AI system using LLMs and RAG to assist healthcare professionals with evidence-based PE diagnosis and treatment recommendations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published