A sophisticated document intelligence system implementing advanced Retrieval-Augmented Generation (RAG) methodologies for comprehensive PDF analysis and interactive question-answering. The platform features a modular microservices architecture optimized for local deployment with enterprise-grade performance characteristics.
Untitled.design.1.mp4
Paper in Demo: Attention Is All You Need by Vaswani et al.
Available Online at: https://arxiv.org/abs/1706.03762
OpenRAGSearch employs a multi-layered RAG pipeline with adaptive document processing capabilities, featuring a decoupled client-server architecture designed for scalability and extensibility.
Backend Infrastructure:
- FastAPI - Asynchronous WSGI framework with automatic OpenAPI schema generation and type validation
- LangChain - Comprehensive RAG orchestration framework with custom prompt engineering and retrieval strategies
- Ollama Integration - Local large language model inference with dynamic model management and intelligent fallback mechanisms
- ChromaDB - High-performance vector database with cosine similarity optimization and persistent storage capabilities
- Hybrid Embedding Pipeline - Dual-mode embedding architecture supporting Ollama (
nomic-embed-text) with SentenceTransformers fallback
Frontend Architecture:
- Modular JavaScript ES6+ - Component-based architecture with class inheritance and dependency injection patterns
- PDF.js Integration - Client-side PDF rendering engine with advanced viewport management and annotation capabilities
- CSS Grid/Flexbox Layout - Professional interface design inspired by Adobe Acrobat with responsive breakpoints
- Fetch API Integration - RESTful communication layer with comprehensive error handling and loading state management
The ingestion system implements a hierarchical document processing strategy optimized for academic and technical literature:
# Semantic-aware chunking configuration
RecursiveCharacterTextSplitter(
chunk_size=400, # Optimized for academic content density
chunk_overlap=100, # 25% overlap ensuring contextual continuity
separators=["\n\n", "\n", ". ", " ", ""] # Hierarchical boundary detection
)Key Processing Features:
- Metadata Enrichment: Automatic page number extraction and chunk identification
- Content Analysis: Statistical metadata generation including line count, paragraph segmentation, and character distribution
- Hierarchical Splitting: Respect for document structure boundaries (paragraphs, sentences, tokens)
Primary Embedding Model: Ollama nomic-embed-text (768-dimensional vectors)
Fallback Architecture: SentenceTransformers all-MiniLM-L6-v2 for offline compatibility
The embedding pipeline incorporates:
- Runtime Model Detection: Dynamic availability checking with graceful degradation
- Batch Processing: Optimized memory utilization for large document sets
- Normalized Vector Storage: L2 normalization for consistent similarity computations
Multi-Stage Retrieval Process:
- Semantic Similarity Search: Cosine similarity with configurable k-value (default: 15)
- Citation Filtering: Intelligent classification and separation of reference-heavy content
- Page-Specific Queries: Targeted retrieval for location-based questions
- Context Ranking: Relevance scoring incorporating multiple factors including content type and document position
Query Enhancement Strategies:
- Contextual Query Expansion: Automatic terminology augmentation for domain-specific queries
- Page-Aware Retrieval: Regex-based page number detection with targeted document section retrieval
- Content Type Classification: Separation of substantive content from bibliographic references
Ollama LLM Wrapper: Custom implementation supporting multiple model architectures
- Model Validation: Runtime availability verification with comprehensive error handling
- Chat-Based Interface: Structured message passing with role-based prompt engineering
- Fallback Mechanisms: Graceful degradation strategies for model unavailability
Prompt Engineering Framework:
# Research-optimized prompt template with structured output requirements
qa_template = """You are an expert research assistant analyzing academic papers.
Your goal is to provide clear, accurate, and comprehensive answers based solely
on the provided context.
INSTRUCTIONS:
- Answer directly with comprehensive technical details
- Always cite page numbers using format (p. X)
- Include specific technical terms and methodological details
- Focus on key contributions and empirical findings
- Provide critical analysis when appropriate"""- Multi-Format Ingestion: Optimized PDF parsing with PyPDFLoader and comprehensive metadata extraction
- Adaptive Chunking: Semantic-aware text segmentation preserving document hierarchy and logical structure
- Memory-Efficient Processing: Progressive loading architecture with optimized resource management
- Structural Analysis: Automatic recognition of document sections, citations, and content classification
- Context-Aware Retrieval: Multi-stage filtering incorporating semantic similarity and relevance scoring
- Citation-Aware Responses: Automatic page reference extraction with interactive navigation capabilities
- Query Type Detection: Intelligent classification of page-specific versus document-wide queries
- Adaptive Context Windows: Dynamic context sizing based on query complexity and content type
- Modular Component Architecture: ES6+ class-based modules with dependency injection and event management
- Advanced PDF Visualization: Zoom controls, rotation, navigation, and annotation overlay systems
- Real-Time Collaboration Tools: Annotation management with export/import capabilities
- Responsive Design Patterns: Optimized for desktop workflows with adaptive panel management
- Local-First Architecture: Complete offline functionality with no external API dependencies
- Session Persistence: Automatic state management with recovery capabilities
- Data Privacy Compliance: All processing occurs locally with zero data transmission
- Extensible Plugin System: Modular design supporting custom processing modules and UI components
# Minimum System Specifications
Python 3.8+ with pip package manager
Ollama runtime environment (v0.5.1+)
4GB+ RAM (8GB recommended for optimal performance)
2GB+ storage for model artifacts# Repository acquisition and setup
git clone https://github.com/your-username/openragsearch.git
cd openragsearch
# Execute automated deployment script
chmod +x run.sh
./run.shAutomated Setup Process:
- Virtual environment creation and dependency isolation
- Package installation with version pinning and dependency resolution
- Ollama model downloading with verification (
gemma2:2b,nomic-embed-text) - Database initialization and health check validation
- Development server startup with hot-reload capabilities
# 1. Ollama Runtime Installation
curl -fsSL https://ollama.ai/install.sh | sh
# 2. Model Acquisition
ollama pull nomic-embed-text # 274MB - Embedding model
ollama pull gemma2:2b # 1.6GB - Primary LLM
# 3. Python Environment Setup
python3 -m venv .venv
source .venv/bin/activate
# 4. Dependency Installation
pip install -r requirements.txt
# 5. Application Launch
uvicorn app:app --host 0.0.0.0 --port 8000 --reload# Document processing configuration (ingestion.py)
PDFIngestion(
chunk_size=400, # Optimized for academic content
chunk_overlap=100, # Contextual continuity preservation
separators=custom # Hierarchical boundary detection
)
# Retrieval system configuration (rag_graph.py)
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 15} # Context richness optimization
)- Document Ingestion: 2-3 pages/second (standard academic papers, 400-character chunks)
- Embedding Generation: 50-100 chunks/second (CPU-optimized local processing)
- Query Response Latency: 2-5 seconds average (including LLM inference and retrieval)
- Memory Footprint: 1-2GB per loaded document (including vector embeddings and cache)
- Lazy Loading: Progressive document rendering with viewport-based optimization
- Vector Caching: Persistent ChromaDB storage with automatic cache invalidation
- Batch Processing: Optimized embedding generation for large document sets
- Connection Pooling: Efficient Ollama client management with retry logic and timeout handling
- Horizontal Scaling: Modular architecture supporting distributed deployment
- Resource Management: Configurable memory limits and processing constraints
- Load Balancing: Support for multiple Ollama instances with intelligent routing
- Caching Strategies: Multi-level caching including vector embeddings and LLM responses
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β FastAPI β β LangChain β β Ollama β
β ASGI Server βββββΊβ RAG Pipeline βββββΊβ LLM Runtime β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Static Files β β ChromaDB β β SentenceTransf. β
β (Frontend) β β Vector Store β β (Fallback) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
// Modular component hierarchy with dependency injection
PDFEditor (Main Controller)
βββ EventManager (Event delegation and handling)
βββ PDFManager (Document rendering and navigation)
βββ AnnotationManager (Drawing and markup tools)
βββ ChatManager (AI interaction and messaging)
βββ UIManager (Interface state and layout)
βββ FileManager (Upload, export, session management)
βββ Utils (Shared utilities and helpers)- Upload & Validation β PDF file validation and temporary storage
- Text Extraction β PyPDFLoader-based content extraction with metadata
- Semantic Chunking β Hierarchical text splitting with boundary preservation
- Vectorization β Dual-mode embedding with Ollama/SentenceTransformers
- Index Creation β ChromaDB storage with cosine similarity optimization
- Query Processing β Multi-stage retrieval with context ranking
- Response Generation β LLM inference with citation extraction and formatting
The system implements a multi-layered chunking approach:
- Hierarchical Text Splitting: Recursive character-based splitting respecting document boundaries
- Semantic Boundary Preservation: Maintains paragraph and section integrity through custom separators
- Adaptive Overlap Management: 25% overlap ratio ensuring contextual continuity across chunks
- Metadata Enhancement: Comprehensive metadata extraction including page numbers, chunk IDs, and structural information
- Primary Model: Ollama
nomic-embed-textoptimized for research document semantics - Dimensionality: 768-dimensional vectors providing optimal semantic precision
- Normalization: L2 normalization ensuring consistent cosine similarity calculations
- Hybrid Architecture: Intelligent fallback to SentenceTransformers for offline compatibility
- Semantic Similarity: Initial cosine similarity search with configurable k-value optimization
- Content Classification: Intelligent separation of substantive content from bibliographic citations
- Query Type Detection: Regex-based page-specific query identification with targeted retrieval
- Context Ranking: Multi-factor relevance scoring incorporating content type, position, and semantic density
# Domain-specific prompt template with structured output requirements
qa_template = """You are an expert research assistant analyzing academic papers.
Your goal is to provide clear, accurate, and comprehensive answers based solely
on the provided context.
INSTRUCTIONS:
- Answer directly with comprehensive technical details
- Always cite page numbers using format (p. X)
- Include specific technical terms and methodological details
- Focus on key contributions and empirical findings
- Provide critical analysis when appropriate"""
## π‘ Usage Methodology & Examples
### Basic Document Analysis Workflow
```bash
# 1. Initialize the application environment
./run.sh
# 2. Access the web interface at http://localhost:8000
# 3. Upload PDF via drag-and-drop interface or file browser
# 4. Monitor processing status through progress indicators
# 5. Initiate queries through the integrated chat interface# Methodological Analysis
"What experimental methodology does this paper employ for evaluation?"
# Technical Architecture Queries
"Explain the attention mechanism architecture and its computational complexity"
# Comparative Analysis
"How does this approach compare to previous transformer-based models?"
# Citation and Reference Extraction
"What are the key references and theoretical foundations mentioned?"
# Page-Specific Queries
"What is discussed on page 5 of this document?"
# Direct API integration for programmatic access
import requests
# Document upload endpoint
files = {'file': open('research_paper.pdf', 'rb')}
response = requests.post('http://localhost:8000/upload', files=files)
# Question-answering endpoint
query_payload = {
"question": "What is the main contribution of this paper?",
"session_notes": []
}
response = requests.post('http://localhost:8000/qa', json=query_payload)
result = response.json()
# System status monitoring
status = requests.get('http://localhost:8000/status').json()// Programmatic access to PDF Editor functionality
window.pdfEditor.goToPageNumber(5);
window.pdfEditor.setZoomFactor(1.5);
window.pdfEditor.chatManager.sendMessage();
// State inspection and debugging
window.getPDFEditorState();
window.getPDFEditorHelp();- Type Annotations: Comprehensive Python type hints with mypy compliance for enhanced IDE support
- Asynchronous Programming: Non-blocking I/O operations with FastAPI's async/await paradigms
- Error Handling: Comprehensive exception management with structured logging and user-friendly error messages
- Modular Design: Component-based architecture with dependency injection and clear separation of concerns
// Modular ES6+ class hierarchy with event-driven communication
class PDFEditor {
constructor() {
this.eventManager = new EventManager();
this.pdfManager = new PDFManager(this);
this.chatManager = new ChatManager(this);
this.uiManager = new UIManager(this);
// Additional managers...
}
}
// Component communication through event delegation
this.eventManager.addEventListener(element, 'click', handler);- Plugin Architecture: Modular embedding model integration supporting custom implementations
- Document Loaders: Extensible processing pipeline for additional file formats
- LLM Abstraction: Provider-agnostic interface supporting multiple language model backends
- UI Component System: Reusable component library with customizable styling and behavior
# Development server with hot-reload capabilities
uvicorn app:app --reload --host 0.0.0.0 --port 8000
# Code quality assurance pipeline
black . && isort . && flake8 --max-line-length=88
# Type checking and validation
mypy app.py ingestion.py rag_graph.py
# Testing suite execution
pytest tests/ -v --cov=./ --cov-report=html- Processing Latency: Real-time document ingestion and query response time tracking
- Memory Utilization: Dynamic monitoring of vector storage and embedding cache efficiency
- Model Performance: LLM inference latency and embedding generation throughput metrics
- System Health: Comprehensive service status monitoring with dependency checking
# Comprehensive system status endpoint
curl http://localhost:8000/status
# Response schema:
{
"rag_graph_loaded": true,
"vector_store_loaded": true,
"pdf_ingestion_loaded": true,
"current_pdf": "/uploads/document.pdf",
"total_documents": 245
}# Large document processing optimization (>100 pages)
PDFIngestion(chunk_size=300, chunk_overlap=75)
# Detailed analysis configuration (smaller documents)
PDFIngestion(chunk_size=500, chunk_overlap=125)
# Memory-constrained environment settings
retriever_kwargs = {"k": 8} # Reduced context window- Ollama Service: Ensure Ollama daemon is active (
ollama serveor system service) - Model Availability: Verify required models are downloaded (
ollama list) - Memory Constraints: Adjust chunk_size parameter for large documents (default: 400)
- Port Conflicts: Configure alternative ports in deployment script if 8000 is occupied
# High-throughput configuration for large document sets
PDFIngestion(
chunk_size=300, # Reduced chunk size for faster processing
chunk_overlap=75 # Minimized overlap for memory efficiency
)
# High-precision configuration for detailed analysis
PDFIngestion(
chunk_size=500, # Larger chunks for richer context
chunk_overlap=125 # Increased overlap for continuity
)
# Resource-constrained environments
retriever_kwargs = {"k": 8} # Reduced retrieval count# Verify Ollama connectivity
ollama list
# Check Python environment
source .venv/bin/activate && python --version
# Validate dependencies
pip check
# Monitor system resources
htop or top- Local Processing: All document analysis and vector computations occur locally without external transmission
- Zero External Dependencies: Complete functionality without internet connectivity or cloud service integration
- Session Isolation: Individual document processing sessions with automatic cleanup
- Secure File Handling: Comprehensive PDF validation, sanitization, and temporary file management
- Input Validation: Multi-layer PDF validation with malicious file detection
- Resource Constraints: Configurable memory limits and processing timeouts
- Access Control: Local-only binding by default with configurable network exposure
- Audit Capabilities: Optional request logging and processing activity monitoring
- GDPR Compliance: Local processing ensures data residency and user control
- No Data Collection: Zero telemetry or analytics data transmission
- User Sovereignty: Complete user control over document processing and storage
- Transparent Processing: Open-source architecture enabling security auditing
- Multi-Document Analysis: Cross-document semantic search and comparative analysis capabilities
- Advanced Annotation System: Collaborative markup with version control and conflict resolution
- Enhanced Export Pipeline: Research report generation with automated citation management
- Progressive Web App: Mobile-optimized interface with offline synchronization
- GPU Acceleration: CUDA-optimized embedding generation and vector operations
- Database Scaling: PostgreSQL backend for enterprise-grade deployments
- Model Fine-tuning: Domain-specific adaptation for specialized research fields
- Real-time Collaboration: WebSocket-based multi-user document analysis sessions
- Advanced RAG Techniques: Implementation of graph-based retrieval and hierarchical summarization
- Multimodal Processing: Integration of figure, table, and equation analysis capabilities
- Semantic Chunking: Machine learning-based boundary detection for optimal content segmentation
- Adaptive Context Windows: Dynamic context sizing based on query complexity and document structure
OpenRAGSearch maintains rigorous development standards and welcomes contributions from the research and engineering community. The codebase adheres to modern Python development practices with comprehensive documentation and testing frameworks.
- Code Style: Black formatting (88-character line length), isort import organization, flake8 compliance
- Type Safety: mypy type checking with strict configuration and comprehensive annotations
- Testing Framework: pytest with >90% coverage requirement and integration test suites
- Documentation: Sphinx-compatible docstrings with comprehensive API documentation
# Development environment setup
git clone https://github.com/your-username/openragsearch.git
cd openragsearch
python -m venv .venv && source .venv/bin/activate
pip install -r requirements-dev.txt
# Code quality validation
make format # Black + isort formatting
make lint # flake8 + mypy checking
make test # pytest execution with coverage- RAG Algorithm Improvements: Enhanced retrieval strategies and ranking algorithms
- Model Integration: Support for additional embedding models and LLM providers
- Frontend Components: Reusable UI components and interaction patterns
- Performance Optimizations: Memory efficiency and processing speed improvements
This project is licensed under the MIT License - see the LICENSE file for details.
Core Backend Dependencies:
- Python 3.8+ with asyncio and type annotation support
- FastAPI 0.116+ with Pydantic v2 validation and automatic OpenAPI generation
- LangChain 0.3+ with community integrations and custom retrieval implementations
- ChromaDB 1.0+ for high-performance vector operations and similarity search
- Ollama Client with model management and runtime optimization
Frontend Technology Stack:
- ES6+ JavaScript with modern Web APIs and component-based architecture
- CSS Grid/Flexbox with custom properties and responsive design patterns
- PDF.js 3.11+ for client-side document rendering and annotation overlay
- Font Awesome 6.4+ for comprehensive iconography and visual consistency
Infrastructure & Deployment:
- Uvicorn ASGI server with auto-reload capabilities and production optimization
- Static file serving with intelligent caching headers and compression
- RESTful API design with comprehensive OpenAPI documentation
- Modular architecture with dependency injection and inversion of control patterns
# Core dependencies (requirements.txt)
fastapi==0.116.1
langchain==0.3.27
langchain-community==0.3.27
chromadb==1.0.15
ollama==0.5.1
sentence-transformers==5.0.0
pypdf==5.8.0
uvicorn==0.35.0Engineered with state-of-the-art RAG methodologies and enterprise-grade architecture for intelligent document analysis and research acceleration.