🧠 Intel Nexus: Enterprise Document Analyzer

A Premium, Agentic Document Intelligence Platform that combines Traditional OCR with Generative AI Vision.

👥 Meet the Team

This project was designed and developed by students from Nutan College of Engineering and Research, Talegaon Dhabade, Pune under Intel Unnati Industrail Training 2025 Programme. Feel free to connect with us and explore our work!

👤 Mithilesh Kolhapurkar

👤 Ankita Patil

👤 Vedant Thorat

🙏 Special Thanks

A huge thank you to our mentor for their guidance, insights, and continuous support throughout the development of this project.

Prof. Priyanka Vyas

📸 Snapshots

Landing Page	Dashboard & Chat

Inspector	Knowledge Base

🎥 Demo Video

▶️ Watch the full system walkthrough on YouTube
👉 https://youtu.be/nZSyUKTinMs

This video demonstrates:

End-to-end document ingestion
OCR vs Gemini Vision switching
Visual RAG responses
Inspector & Knowledge Base auditing

🌟 Overview

The Intel Enterprise Document Analyzer is designed to solve the "Last Mile" problem of document intelligence: extracting structured data from unstructured, messy real-world PDFs. By dynamically switching between Tesseract OCR (for speed) and Google Gemini Vision (for reasoning), it achieves high accuracy even on handwritten or complex documents.

📘 Read the Full User Guide
📘 Read the System Architecture
📘 Read the API Reference

Key Capabilities

Hybrid Extraction Engine: Automatically handles Digital vs. Scanned vs. Handwriting PDFs
Deep Artifact Extraction: Isolates tables into DataFrames and crops images for separate indexing
Visual RAG: Search results provide not just text, but visual "evidence" crops from the original PDF
Agentic Inspector: Dedicated UI to audit every chunk, image, and table found in a document
Enterprise-Ready Pipeline: Modular ingestion, parsing, chunking, embedding, and retrieval

🚀 Quick Start (Local Development)

Prerequisites

Python 3.9+
Docling OCR
Google Gemini API Key

Installation

Clone & Install

git clone <repo>
cd IntelProject
pip install -r requirements.txt

Configure API Key

Copy code
export GOOGLE_API_KEY="your_api_key"
# or configure inside backend/config.py
Run the System (Two Terminals)

Backend (FastAPI):

Copy code
uvicorn backend.main:app --host 127.0.0.1 --port 8000
Frontend (Streamlit):

Copy code
streamlit run frontend/app.py

📂 Data Directory Structure (Auto / Manual)

The backend expects the following structure:

data/
├── processed/
├── uploads/
├── vectordb/
└── static/
    ├── images/
    ├── pages/
    └── pdfs/

Create manually if needed:

mkdir -p data/processed \
         data/static/images \
         data/static/pages \
         data/static/pdfs \
         data/uploads \
         data/vectordb

🧠 Architectural Highlights

FastAPI Backend: High-performance async API

Streamlit Frontend: Interactive enterprise dashboard

Vector Store: Persistent document embeddings

Vision + Text RAG: Multi-modal retrieval with visual grounding

Inspector Mode: Transparent AI auditing

🔐 Security & Configuration Notes

API keys are never hardcoded (recommended via environment variables)

Containers run isolated via Docker network

Designed for private VM / enterprise network usage

📚 Documentation

User Guide : Detailed walkthrough of features

System Architecture : Technical deep-dive

API Reference : Backend endpoints

📜 License

MIT License

⭐ If you find this project useful

Give it a star ⭐ and feel free to fork or extend it for research, enterprise pilots, or hackathons.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
backend		backend
data/vectordb		data/vectordb
frontend		frontend
API_REFERENCE.md		API_REFERENCE.md
ARCHITECTURE.md		ARCHITECTURE.md
Intel_Nexus_Project_Report.pdf		Intel_Nexus_Project_Report.pdf
README.md		README.md
USER_GUIDE.md		USER_GUIDE.md
benchmarks.py		benchmarks.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
test_docling.py		test_docling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Intel Nexus: Enterprise Document Analyzer

👥 Meet the Team

👤 Mithilesh Kolhapurkar

👤 Ankita Patil

👤 Vedant Thorat

🙏 Special Thanks

📸 Snapshots

🎥 Demo Video

🌟 Overview

Key Capabilities

🚀 Quick Start (Local Development)

Prerequisites

Installation

📂 Data Directory Structure (Auto / Manual)

🧠 Architectural Highlights

🔐 Security & Configuration Notes

📚 Documentation

📜 License

⭐ If you find this project useful

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MITHILESHK11/IntelProject

Folders and files

Latest commit

History

Repository files navigation

🧠 Intel Nexus: Enterprise Document Analyzer

👥 Meet the Team

👤 Mithilesh Kolhapurkar

👤 Ankita Patil

👤 Vedant Thorat

🙏 Special Thanks

📸 Snapshots

🎥 Demo Video

🌟 Overview

Key Capabilities

🚀 Quick Start (Local Development)

Prerequisites

Installation

📂 Data Directory Structure (Auto / Manual)

🧠 Architectural Highlights

🔐 Security & Configuration Notes

📚 Documentation

📜 License

⭐ If you find this project useful

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages