Designed to solve the privacy risks of uploading sensitive financial/legal documents to cloud LLMs like GPT-4. DocuMind is an advanced, privacy-focused AI assistant that allows users to chat with their documents and the web. Built with a "local-first" philosophy, it leverages Ollama to run powerful LLMs directly on your machine, ensuring your data never leaves your control.
Whether you need to summarize a research paper, extract insights from a financial report, or query a GitHub repository, DocuMind provides a seamless, chat-based interface to interact with your content.
- 📄 Chat with PDFs: Upload PDF documents and ask questions. The system uses RAG (Retrieval-Augmented Generation) to provide accurate answers with source citations.
- 🌍 Universal Ingestion: Paste URLs from Wikipedia, GitHub, or any other website. DocuMind scrapes and processes the content, making it instantly queryable.
- 🎧 Audio Summaries: Generate podcast-style audio summaries of your documents to listen on the go.
- 🔒 Local & Private: Powered by Ollama, all processing happens locally. No API keys required, no data leakage.
- 🧠 Model Agnostic: Switch between different LLMs (Llama 3, Mistral, Gemma) on the fly directly from the chat interface.
- ⚡ Real-time Streaming: Enjoy a smooth, typewriter-style chat experience with low latency.
- 🎨 Modern UI: A clean, responsive interface built with Next.js and Shadcn UI, featuring dark mode support and syntax highlighting for code.
- Framework: Next.js 16 (App Router)
- Styling: Tailwind CSS
- Components: Shadcn UI
- Icons: Lucide React
- Markdown:
react-markdownwithreact-syntax-highlighter
- API: FastAPI
- Vector Store: ChromaDB
- LLM Orchestration: LangChain
- PDF Processing: PyMuPDF (fitz)
- Web Scraping:
beautifulsoup4&WebBaseLoader - Audio Generation:
gTTS(Google Text-to-Speech)
Before you begin, ensure you have the following installed:
- Node.js (v18 or higher)
- Python (v3.10 or higher)
- Ollama: Download and install from ollama.com.
- Pull a model:
ollama pull llama3(or your preferred model).
- Pull a model:
git clone https://github.com/iamdanwi/pdf-assitant.git
cd documindNavigate to the server directory and set up the Python environment.
cd server
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtNote: If requirements.txt is missing, install the core dependencies:
pip install fastapi uvicorn[standard] langchain-ollama langchain-community chromadb pymupdf python-multipart httpx gTTS beautifulsoup4Navigate to the client directory and install dependencies.
cd ../client
npm install-
Start the Backend Server:
# In /server directory source .venv/bin/activate uvicorn app.main:app --reload
The API will be available at
http://localhost:8000. -
Start the Frontend Client:
# In /client directory npm run devThe application will be available at
http://localhost:3000.
- Select a Model: Use the dropdown in the chat input to select your installed Ollama model.
- Add Content:
- Upload PDF: Click the paperclip icon to upload a document.
- Add URL: Click "Add from URL" to paste a link (e.g., a Wikipedia article).
- Chat: Type your questions. The AI will answer based on the uploaded context.
- Listen: Click "Generate Audio" to hear a summary of the active document.
- New Chat: Click the
+icon in the sidebar to clear the context and start fresh.
documind/
├── client/ # Next.js Frontend
│ ├── app/
│ │ ├── components/ # React Components (ChatInterface, MessageBubble)
│ │ ├── hooks/ # Custom Hooks (useChat)
│ │ └── lib/ # Utilities
│ └── public/
├── server/ # FastAPI Backend
│ ├── app/
│ │ ├── api/ # API Routes (chat, ingest, audio)
│ │ ├── core/ # Configuration
│ │ └── services/ # Business Logic (Vector Store)
│ └── static/ # Generated audio files
└── README.md
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.