A lightweight, local Retrieval-Augmented Generation (RAG) system for analyzing diabetes-related patient profiles using LangChain, ChromaDB, and Ollama.
It allows natural language queries over structured CSV medical data, enabling fast and private exploration of patient data — entirely offline.
- A CSV dataset of patient profiles (medical features like BMI, blood pressure, glucose, etc.) is loaded.
- Each row is transformed into a textual medical profile (e.g. "55-year-old male with high glucose and low HDL...").
- These profiles are embedded using the
mxbai-embed-largemodel and stored in a Chroma vector database. - When you ask a question, the most relevant profiles are retrieved (based on semantic similarity).
- A custom prompt is created using those profiles and sent to a local LLM (e.g.
llama3) via Ollama. - The model generates an informed medical response.
git clone https://github.com/mattialoszach/local-rag.git
cd local-ragpip install -r requirements.txt3. Install Ollama and the required models
ollama pull llama3
ollama pull mxbai-embed-largeThe system expects a CSV file named diabetes_dataset.csv in the root directory, with the following columns:
id, Age, Sex, Ethnicity, BMI, Waist_Circumference, Fasting_Blood_Glucose, HbA1c, Blood_Pressure_Systolic, Blood_Pressure_Diastolic, Cholesterol_Total, Cholesterol_HDL, Cholesterol_LDL, GGT, Serum_Urate, Physical_Activity_Level, Dietary_Intake_Calories, Alcohol_Consumption, Smoking_Status, Family_History_of_Diabetes, Previous_Gestational_DiabetesYou can replace the dataset with your own medical records (same format). Dataset can be found here: Kaggle
python main.pyThen simply type natural language questions, such as:
- “Show patients with high fasting glucose and low HDL.”
- “Which profiles have signs of metabolic syndrome?”
- “Find people with a family history of diabetes and obesity.”
Type '/q', '/quit' or '/exit' to quit.
This tool is ideal for:
- Local clinical research and education
- Medical case exploration and risk group analysis
- Privacy-preserving data querying without cloud APIs
- Prototyping AI-based assistants for structured health data
- Add a web interface or GUI
- Support for PDF or JSON patient records
- Integrate a diagnosis suggestion model
- Include follow-up questions (multi-turn RAG)
- Add further API support
All processing is done locally:
- No cloud APIs
- All data stays on your machine
- Perfect for sensitive health-related use
