A powerful Retrieval-Augmented Generation (RAG) system for querying PDF documents using multiple advanced RAG approaches. PDFBrain enables intelligent document Q&A by combining vector search with large language models, powered by Vercel AI SDK.
pdfBrain.mp4
- PDF Document Upload: Upload and process PDF files for intelligent querying
- Three RAG Approaches: Choose from different retrieval strategies optimized for various query types
- Simple RAG with Query Rewriting: Reformulates queries for better retrieval accuracy
- Multi-Query RAG: Generates multiple query variations and uses Reciprocal Rank Fusion for improved results
- Query Decomposition: Breaks down complex queries into focused sub-questions
- Streaming Responses: Real-time streaming chat interface using Vercel AI SDK
- Vector Search: Powered by Qdrant vector database for efficient similarity search
- Modern UI: Beautiful, responsive Next.js interface with markdown rendering
PDFBrain consists of two main components:
- Client: Next.js 16 application with React 19, using Vercel AI SDK (
@ai-sdk/react) for streaming chat - Server: Express.js backend with TypeScript, integrating LangChain for document processing and vector operations
- How it works: The user query is first rewritten by an LLM to be more specific and detailed, improving retrieval accuracy
- Best for: Simple queries that need refinement, ambiguous questions
- Process:
- User query β LLM rewrites query
- Rewritten query β Vector search
- Retrieved documents β LLM generates answer
- How it works: Generates 5 different variations of the user query, performs vector search for each, then uses Reciprocal Rank Fusion (RRF) to combine and rank results
- Best for: Overcoming limitations of distance-based similarity search, improving recall
- Process:
- User query + PDF context β Generate 5 query variations
- Each variation β Vector search
- Results β Reciprocal Rank Fusion
- Top-ranked documents β LLM generates answer
- How it works: Breaks complex queries into 2-5 focused sub-questions, searches for each independently, then combines unique results
- Best for: Complex, multi-step questions with multiple constraints or comparisons
- Process:
- Complex query β Decompose into sub-questions
- Each sub-question β Vector search
- Combine and deduplicate results
- Unique documents β LLM generates answer
- Framework: Next.js 16
- React: 19.2.0
- AI SDK: Vercel AI SDK (
ai,@ai-sdk/react) - UI: Shadcn UI, Tailwind CSS
- State Management: Zustand
- Markdown: react-markdown with syntax highlighting
- Runtime: Node.js with Express.js 5
- Language: TypeScript
- AI SDK: Vercel AI SDK (
ai,@ai-sdk/google) - LLM: Google Gemini 2.5 Flash
- Vector Database: Qdrant
- Embeddings: Ollama (default: qwen3-embedding:4b)
- Document Processing: LangChain, pdf-parse
- File Upload: Multer
- Node.js (v18 or higher)
- pnpm (or npm/yarn)
- Qdrant vector database (running locally or remotely)
- Ollama (for embeddings, with
qwen3-embedding:4bmodel) - Google Gemini API Key (for LLM inference)
-
Clone the repository
git clone https://github.com/Akash1000x/PDFBrain cd PDFBrain -
Install server dependencies
cd server pnpm install -
Install client dependencies
cd ../client pnpm install
Create a .env file in the server directory:
# Server Configuration
PORT=8000
# Google Gemini API
GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_api_key_here
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
# Embedding Model (Ollama)
EMBEDDING_MODEL=qwen3-embedding:4bCreate a .env.local file in the client directory:
# API URL
NEXT_PUBLIC_API_URL=http://localhost:8000/apiUsing Docker:
cd server
docker-compose up -dOr
docker run -p 6333:6333 qdrant/qdrant# Start Ollama service
ollama serve
# Pull the embedding model (in another terminal)
ollama pull qwen3-embedding:4bcd server
pnpm devThe server will run on http://localhost:8000
cd client
pnpm devThe client will run on http://localhost:3000
pdf-rag/
βββ client/ # Next.js frontend application
β βββ app/ # Next.js app directory
β βββ components/ # React components
β β βββ chat.tsx # Main chat interface
β β βββ prompt-input.tsx
β β βββ markdown.tsx
β βββ lib/ # Utility functions
β βββ store/ # Zustand state management
β βββ hooks/ # Custom React hooks
β
βββ server/ # Express.js backend
β βββ src/
β β βββ controllers/ # Request handlers
β β β βββ rags/
β β β β βββ basic.ts # Query rewriting RAG
β β β β βββ multi-query.ts # Multi-query RAG
β β β β βββ query-decomposition.ts # Query decomposition RAG
β β β βββ upload.controller.ts
β β βββ middleware/ # Express middleware
β β βββ routes/ # API routes
β β βββ utils/ # Utility functions
β β β βββ vectorstore.ts
β β β βββ pdfloader.ts
β β β βββ prompts.ts
β β βββ index.ts # Server entry point
β βββ uploads/ # Uploaded PDF files
β βββ pdf/ # Processed PDF data
β
βββ Readme.md # This file
- Upload a PDF: Use the upload interface to add a PDF document
- Select RAG Approach: Choose from Simple, Multi-Query, or Query Decomposition
- Ask Questions: Start chatting with your document using natural language
- Get Answers: Receive streaming responses based on the selected RAG approach
- Document Processing: When a PDF is uploaded, it's parsed, split into chunks, and embedded using Ollama
- Vector Storage: Embeddings are stored in Qdrant with the filename as the collection name
- Query Processing: Based on the selected RAG approach, the query is processed (rewritten, decomposed, or varied)
- Retrieval: Relevant document chunks are retrieved using vector similarity search
- Generation: The LLM generates a response using the retrieved context and user query
- Streaming: Responses are streamed to the client in real-time
Contributions are welcome! Please feel free to submit a Pull Request.
Built with β€οΈ