Skip to content

Adars2005/OCR_Hackathon_Solution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧾 Medical Report OCR Parser 🧠 This project extracts structured JSON data from scanned medical reports using Tesseract OCR and Ollama LLMs (like Gemma or LLaMA). Ideal for automating data entry from printed lab reports.

🚀 Features 🖼️ OCR with Tesseract

📄 Medical text parsing using local LLMs via Ollama

🧠 Tested with models like gemma:2b, llama3.2:3b, and mistral

🧾 Outputs:

Clean text

Structured JSON

Debug logs

🔧 Setup

  1. Clone this repo bash Copy Edit git clone https://github.com/Adars2005/ocr-hackathon cd ocr-hackathon
  2. Install dependencies bash Copy Edit pip install -r requirements.txt
  3. Install & Configure Tesseract Windows: Download Tesseract

Add path to tesseract.exe in your system PATH

  1. Install Ollama and Pull Model bash Copy Edit ollama run llama3.2:3b

or try gemma2:b or mistral

▶️ Run the App bash Copy Edit python main.py --input "input_images/input.jpeg" 📁 Output ./output/text/ – Raw extracted text

./output/json/ – Final structured JSON

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages