or try gemma2:b or mistral

🧾 Medical Report OCR Parser 🧠 This project extracts structured JSON data from scanned medical reports using Tesseract OCR and Ollama LLMs (like Gemma or LLaMA). Ideal for automating data entry from printed lab reports.

🚀 Features 🖼️ OCR with Tesseract

📄 Medical text parsing using local LLMs via Ollama

🧠 Tested with models like gemma:2b, llama3.2:3b, and mistral

🧾 Outputs:

Clean text

Structured JSON

Debug logs

🔧 Setup

Clone this repo bash Copy Edit git clone https://github.com/Adars2005/ocr-hackathon cd ocr-hackathon
Install dependencies bash Copy Edit pip install -r requirements.txt
Install & Configure Tesseract Windows: Download Tesseract

Add path to tesseract.exe in your system PATH

Install Ollama and Pull Model bash Copy Edit ollama run llama3.2:3b

or try gemma2:b or mistral

▶️ Run the App bash Copy Edit python main.py --input "input_images/input.jpeg" 📁 Output ./output/text/ – Raw extracted text

./output/json/ – Final structured JSON

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
config.yaml		config.yaml
img_extracted.json		img_extracted.json
input_image.jpeg		input_image.jpeg
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

or try gemma2:b or mistral

About

Uh oh!

Releases

Packages

Languages

Adars2005/OCR_Hackathon_Solution

Folders and files

Latest commit

History

Repository files navigation

or try gemma2:b or mistral

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages