🧾 Medical Report OCR Parser 🧠 This project extracts structured JSON data from scanned medical reports using Tesseract OCR and Ollama LLMs (like Gemma or LLaMA). Ideal for automating data entry from printed lab reports.
🚀 Features 🖼️ OCR with Tesseract
📄 Medical text parsing using local LLMs via Ollama
🧠 Tested with models like gemma:2b, llama3.2:3b, and mistral
🧾 Outputs:
Clean text
Structured JSON
Debug logs
🔧 Setup
- Clone this repo bash Copy Edit git clone https://github.com/Adars2005/ocr-hackathon cd ocr-hackathon
- Install dependencies bash Copy Edit pip install -r requirements.txt
- Install & Configure Tesseract Windows: Download Tesseract
Add path to tesseract.exe in your system PATH
- Install Ollama and Pull Model bash Copy Edit ollama run llama3.2:3b
./output/json/ – Final structured JSON