A GDPR-compliant Retrieval-Augmented Generation (RAG) system designed for academic environments to securely query document collections.
- EU Data Sovereignty: All components comply with EU data protection regulations
- Simple Document Management: Add text files to a watched folder for automatic processing
- Metadata Support: Include bibliographic data for academic publications and other documents
- Natural Language Querying: Ask questions about your documents in natural language
- Source Citations: All answers include references to source documents
- GDPR Compliance: Built with privacy by design principles
- Enhanced Security:
- Authentication for web interface
- Request validation and sanitization
- API key rotation mechanisms
- Docker Secrets for credential management
- Network isolation between components
- Security headers via reverse proxy
- Rate limiting and abuse prevention
- Vector Database: Weaviate (Netherlands-based)
- LLM Provider: Mistral AI (France-based)
- Backend: FastAPI (Python)
- Frontend Vue.js + Nginx
- Deployment: Docker containers on Hetzner (German cloud provider)
- Docker and Docker Compose
- At least 4GB of available RAM
- Mistral AI API key
-
Clone this repository:
git clone https://github.com/ducroq/doc-chat.git cd doc-chat -
Set up your Mistral AI API key securely using Docker Secrets:
mkdir -p ./secrets echo "your_api_key_here" > ./secrets/mistral_api_key.txt chmod 600 ./secrets/mistral_api_key.txt
-
Start the system:
- On Windows
.\start.ps1- On Linux
chmod +x start.sh stop.sh ./start.sh
-
Access the interfaces:
- Web interface: http://localhost:8081 (served by Nginx)
- API documentation: http://localhost:8000/docs
- Weaviate console: http://localhost:8080
Simply place files in the data/ directory. The system will automatically process and index them.
- Place your
.txtfiles in thedata/directory - For each text file, create a corresponding metadata file with the same base name:
data/ example.txt example.metadata.json - Format the metadata file using a Zotero-inspired schema:
{ "itemType": "journalArticle", "title": "Example Paper Title", "creators": [ {"firstName": "John", "lastName": "Smith", "creatorType": "author"} ], "date": "2023", "publicationTitle": "Journal Name", "tags": ["tag1", "tag2"] }
The system will automatically associate metadata with documents and display it when providing answers.
The system includes a secure authentication system:
- JWT-based authentication for API and web interfaces
- User management via command-line tool
- Bcrypt password hashing
- Role-based access control
To set up initial authentication after installation:
# Create a JWT secret key
openssl rand -hex 32 > ./secrets/jwt_secret_key.txt
chmod 600 ./secrets/jwt_secret_key.txt
# Create an admin user
python manage_users.py create admin --generate-password --adminFor detailed information on authentication, see the Authentication System Documentation.
- Chat Logging: Optional logging of interactions for research purposes
- Privacy-First Design: GDPR-compliant with anonymization and automatic data retention policies
- Transparent Processing: Clear user notifications when logging is enabled
For more detailed information about the system, check the following documentation:
- Architecture Overview
- Authentication
- Deployment Guide
- User Guide
- Developer Guide
- Security
- Privacy Notice
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.