Skip to content

A Multi-RAG Powered PDF Intelligence Engine that lets you chat with your PDF files using advanced RAG techniques.

Notifications You must be signed in to change notification settings

Akash1000x/PDFBrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PDFBrain

A powerful Retrieval-Augmented Generation (RAG) system for querying PDF documents using multiple advanced RAG approaches. PDFBrain enables intelligent document Q&A by combining vector search with large language models, powered by Vercel AI SDK.

pdfBrain.mp4

πŸš€ Features

  • PDF Document Upload: Upload and process PDF files for intelligent querying
  • Three RAG Approaches: Choose from different retrieval strategies optimized for various query types
    • Simple RAG with Query Rewriting: Reformulates queries for better retrieval accuracy
    • Multi-Query RAG: Generates multiple query variations and uses Reciprocal Rank Fusion for improved results
    • Query Decomposition: Breaks down complex queries into focused sub-questions
  • Streaming Responses: Real-time streaming chat interface using Vercel AI SDK
  • Vector Search: Powered by Qdrant vector database for efficient similarity search
  • Modern UI: Beautiful, responsive Next.js interface with markdown rendering

πŸ—οΈ Architecture

PDFBrain consists of two main components:

  • Client: Next.js 16 application with React 19, using Vercel AI SDK (@ai-sdk/react) for streaming chat
  • Server: Express.js backend with TypeScript, integrating LangChain for document processing and vector operations

RAG Approaches

1. Simple RAG with Query Rewriting

  • How it works: The user query is first rewritten by an LLM to be more specific and detailed, improving retrieval accuracy
  • Best for: Simple queries that need refinement, ambiguous questions
  • Process:
    1. User query β†’ LLM rewrites query
    2. Rewritten query β†’ Vector search
    3. Retrieved documents β†’ LLM generates answer

2. Multi-Query RAG

  • How it works: Generates 5 different variations of the user query, performs vector search for each, then uses Reciprocal Rank Fusion (RRF) to combine and rank results
  • Best for: Overcoming limitations of distance-based similarity search, improving recall
  • Process:
    1. User query + PDF context β†’ Generate 5 query variations
    2. Each variation β†’ Vector search
    3. Results β†’ Reciprocal Rank Fusion
    4. Top-ranked documents β†’ LLM generates answer

3. Query Decomposition

  • How it works: Breaks complex queries into 2-5 focused sub-questions, searches for each independently, then combines unique results
  • Best for: Complex, multi-step questions with multiple constraints or comparisons
  • Process:
    1. Complex query β†’ Decompose into sub-questions
    2. Each sub-question β†’ Vector search
    3. Combine and deduplicate results
    4. Unique documents β†’ LLM generates answer

πŸ› οΈ Tech Stack

Client

  • Framework: Next.js 16
  • React: 19.2.0
  • AI SDK: Vercel AI SDK (ai, @ai-sdk/react)
  • UI: Shadcn UI, Tailwind CSS
  • State Management: Zustand
  • Markdown: react-markdown with syntax highlighting

Server

  • Runtime: Node.js with Express.js 5
  • Language: TypeScript
  • AI SDK: Vercel AI SDK (ai, @ai-sdk/google)
  • LLM: Google Gemini 2.5 Flash
  • Vector Database: Qdrant
  • Embeddings: Ollama (default: qwen3-embedding:4b)
  • Document Processing: LangChain, pdf-parse
  • File Upload: Multer

πŸ“‹ Prerequisites

  • Node.js (v18 or higher)
  • pnpm (or npm/yarn)
  • Qdrant vector database (running locally or remotely)
  • Ollama (for embeddings, with qwen3-embedding:4b model)
  • Google Gemini API Key (for LLM inference)

πŸ”§ Installation

  1. Clone the repository

    git clone https://github.com/Akash1000x/PDFBrain
    cd PDFBrain
  2. Install server dependencies

    cd server
    pnpm install
  3. Install client dependencies

    cd ../client
    pnpm install

βš™οΈ Configuration

Server Environment Variables

Create a .env file in the server directory:

# Server Configuration
PORT=8000

# Google Gemini API
GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_api_key_here

# Qdrant Configuration
QDRANT_URL=http://localhost:6333

# Embedding Model (Ollama)
EMBEDDING_MODEL=qwen3-embedding:4b

Client Environment Variables

Create a .env.local file in the client directory:

# API URL
NEXT_PUBLIC_API_URL=http://localhost:8000/api

πŸš€ Running the Application

1. Start Qdrant (if running locally)

Using Docker:

cd server
docker-compose up -d

Or

docker run -p 6333:6333 qdrant/qdrant

2. Start Ollama and pull the embedding model

# Start Ollama service
ollama serve

# Pull the embedding model (in another terminal)
ollama pull qwen3-embedding:4b

3. Start the Server

cd server
pnpm dev

The server will run on http://localhost:8000

4. Start the Client

cd client
pnpm dev

The client will run on http://localhost:3000

πŸ“ Project Structure

pdf-rag/
β”œβ”€β”€ client/                 # Next.js frontend application
β”‚   β”œβ”€β”€ app/               # Next.js app directory
β”‚   β”œβ”€β”€ components/        # React components
β”‚   β”‚   β”œβ”€β”€ chat.tsx      # Main chat interface
β”‚   β”‚   β”œβ”€β”€ prompt-input.tsx
β”‚   β”‚   └── markdown.tsx
β”‚   β”œβ”€β”€ lib/              # Utility functions
β”‚   β”œβ”€β”€ store/            # Zustand state management
β”‚   └── hooks/            # Custom React hooks
β”‚
β”œβ”€β”€ server/                # Express.js backend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ controllers/  # Request handlers
β”‚   β”‚   β”‚   β”œβ”€β”€ rags/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ basic.ts              # Query rewriting RAG
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ multi-query.ts         # Multi-query RAG
β”‚   β”‚   β”‚   β”‚   └── query-decomposition.ts # Query decomposition RAG
β”‚   β”‚   β”‚   └── upload.controller.ts
β”‚   β”‚   β”œβ”€β”€ middleware/   # Express middleware
β”‚   β”‚   β”œβ”€β”€ routes/       # API routes
β”‚   β”‚   β”œβ”€β”€ utils/        # Utility functions
β”‚   β”‚   β”‚   β”œβ”€β”€ vectorstore.ts
β”‚   β”‚   β”‚   β”œβ”€β”€ pdfloader.ts
β”‚   β”‚   β”‚   └── prompts.ts
β”‚   β”‚   └── index.ts      # Server entry point
β”‚   β”œβ”€β”€ uploads/          # Uploaded PDF files
β”‚   └── pdf/              # Processed PDF data
β”‚
└── Readme.md             # This file

πŸ’‘ Usage

  1. Upload a PDF: Use the upload interface to add a PDF document
  2. Select RAG Approach: Choose from Simple, Multi-Query, or Query Decomposition
  3. Ask Questions: Start chatting with your document using natural language
  4. Get Answers: Receive streaming responses based on the selected RAG approach

πŸ” How It Works

  1. Document Processing: When a PDF is uploaded, it's parsed, split into chunks, and embedded using Ollama
  2. Vector Storage: Embeddings are stored in Qdrant with the filename as the collection name
  3. Query Processing: Based on the selected RAG approach, the query is processed (rewritten, decomposed, or varied)
  4. Retrieval: Relevant document chunks are retrieved using vector similarity search
  5. Generation: The LLM generates a response using the retrieved context and user query
  6. Streaming: Responses are streamed to the client in real-time

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Built with ❀️

About

A Multi-RAG Powered PDF Intelligence Engine that lets you chat with your PDF files using advanced RAG techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published