This is an implementation of the Hierarchical Retrieval Augmented Generation (HiRAG) system based on the HiRAG repository by hhy-huang, with enhancements by georgiedekker.
HI_RAG is a new approach to create a retrieval augmented generation function. It uses hierarchy in the construction of a multi-layer knowledge graph to improve the quality of generated responses.
Key features:
- Hierarchical knowledge organization with global, bridge, and local knowledge layers
- Multi-layer graph construction for better representation of knowledge relationships
- Dynamic retrieval process that integrates information across layers
- Significantly better results compared to traditional RAG approaches
- Multi-provider support (OpenAI, Ollama, DeepSeek, Azure OpenAI, and Cohere)
- Robust text sanitization for handling special characters and JSON parsing challenges
If you're planning to publish this code to a Git repository, please follow these steps:
-
Check for sensitive information first:
./check_sensitive_info.sh
-
Ensure your
.envfile is not included in Git:# Verify .gitignore includes .env cat .gitignore | grep .env
-
Use
REPOSITORY.mdas your README.md in the Git repository:cp REPOSITORY.md README.md
See REPOSITORY.md for detailed information about the repository structure.
- Python 3.8+ with pip
- Docker and Docker Compose (for containerized usage)
- Ollama (running locally or accessible via API) for open source models
- Neo4j (optional, for graph database storage)
- API keys for Cohere, OpenAI, DeepSeek, or Azure OpenAI (if using those providers)
- Run the setup verification script which will check and install necessary components:
cd hi_rag
python verify_setup.pyThis script will:
- Install required packages from requirements.txt
- Check if HiRAG is installed, and install it if not
- Create the data directory if it doesn't exist
- After successful verification, you can use the hi_rag_demo.py script.
- Clone this repository
- Make sure Ollama is installed and running
- Use Docker Compose to build and run the container:
cd hi_rag
docker-compose up -dTo use the Neo4j integration:
- Install Neo4j (Community or Enterprise edition)
- Start the Neo4j service
- Create a database and set a username and password
- When running the pipeline, use the
--use-neo4jflag along with connection details
The implementation supports multiple model providers:
Ollama provides a way to run open-source models locally. The system is configured to work with:
- GLM4 - A powerful open-source model from Tsinghua University
- rjmalagon/gte-qwen2-7b-instruct:f16 - A fine-tuned embedding model (3584 dimensions)
To use these models:
-
Install and Start Ollama:
# Install Ollama (if not already installed) curl -fsSL https://ollama.com/install.sh | sh # Start the Ollama service ollama serve
-
Pull the Required Models:
# Pull the GLM4 model ollama pull glm4 # Pull the embedding model ollama pull rjmalagon/gte-qwen2-7b-instruct:f16
-
Configure in .env file:
# Set Ollama as provider PROVIDER=ollama # Configure Ollama endpoint (default is localhost) OLLAMA_BASE_URL=http://localhost:11434 # Set default LLM model OPENAI_MODEL_NAME=glm4 # Set embedding model (for vector embeddings) OLLAMA_EMBEDDING_MODEL=rjmalagon/gte-qwen2-7b-instruct:f16
This implementation includes integration with Cohere's API for entity extraction:
-
Set up Cohere API Key:
# In your .env file COHERE_API_KEY=your_api_key COHERE_CHAT_MODEL=command COHERE_EMBEDDING_MODEL=embed-english-v3.0 COHERE_EMBEDDING_DIM=1024 -
Run the Cohere pipeline:
./run_cohere_pipeline.sh ingest_dir ner_dir chunker_dir
You can also configure:
- DeepSeek API for "best" model functions
- OpenAI API for GPT models and embeddings
- Azure OpenAI for hosted OpenAI models
The implementation provides two Python scripts for working with HiRAG:
This script automatically handles both indexing and querying in one step:
# Basic usage with default sample document
python run_hirag.py --query "What are the key features of HiRAG?"
# Specify a different document
python run_hirag.py --query "What is HiRAG?" --document path/to/your/document.txt
# Force reindexing even if vector store exists
python run_hirag.py --query "What is HiRAG?" --force-reindex
# Clean vector database (useful for fixing dimension mismatches)
python run_hirag.py --query "What is HiRAG?" --clean
# Change the query mode
python run_hirag.py --query "What is HiRAG?" --mode naiveFor more control over the indexing and querying steps:
# Index a document
python hi_rag_demo.py --index sample_document.txt
# Run a query using the hierarchical mode
python hi_rag_demo.py --query "What are the key features of HiRAG?" --mode hi
# Interactive mode
python hi_rag_demo.pyA shell script run.sh is provided for easier usage:
# Setup the environment
./run.sh --setup
# Run a query
./run.sh -q "What is HiRAG?"
# Run with different modes and options
./run.sh -q "What is HiRAG?" -m naive # Use naive RAG mode
./run.sh -q "What is HiRAG?" -f # Force reindexing
./run.sh -q "What is HiRAG?" -c # Clean vector database
./run.sh -q "What is HiRAG?" -d my_doc.txt # Use a different document
# Show help
./run.sh -hA pipeline integration script pipeline_integration.py and a convenience shell script run_pipeline.sh are provided to integrate HI_RAG with the existing pipeline components (ingest, graph_ner, and rag_chunker).
# Show help with all available options
./run_pipeline.sh -h
# Basic usage (indexing only)
./run_pipeline.sh -i ../ingest/outputs -n ../graph_ner/output -c ../rag_chunker/output
# Index and run a query
./run_pipeline.sh -i ../ingest/outputs -n ../graph_ner/output -c ../rag_chunker/output -q "What is the main topic?"
# Using Neo4j integration
./run_pipeline.sh -i ../ingest/outputs -n ../graph_ner/output -c ../rag_chunker/output --use-neo4j
# Full Neo4j configuration
./run_pipeline.sh -i ../ingest/outputs -n ../graph_ner/output -c ../rag_chunker/output \
--use-neo4j --neo4j-url "neo4j://localhost:7687" --neo4j-user "neo4j" --neo4j-pass "password"
# Advanced chunking configuration
./run_pipeline.sh -i ../ingest/outputs -n ../graph_ner/output -c ../rag_chunker/output \
--chunk-size 1500 --chunk-overlap 200
# Using HNSWLib for vector database
./run_pipeline.sh -i ../ingest/outputs -n ../graph_ner/output -c ../rag_chunker/output --use-hnswlib
# Complete configuration with all features
./run_pipeline.sh -i ../ingest/outputs -n ../graph_ner/output -c ../rag_chunker/output \
-q "What is the main topic?" -m hi --use-neo4j --neo4j-url neo4j://localhost:7687 \
--chunk-size 1500 --chunk-overlap 200 --max-cluster-size 15 --use-hnswlib \
--embedding-batch 64 --embedding-async 16 --naive-ragTo use the Cohere API for entity extraction and text processing:
# Run the Cohere pipeline with your data directories
./run_cohere_pipeline.sh ingest_dir ner_dir chunker_dirThe script includes robust text sanitization to ensure all chunks are properly processed, handling special characters, JSON delimiters, and other potential issues.
GLM4 is a powerful open source model that provides high-quality generation capabilities:
# First ensure GLM4 is pulled into Ollama
ollama pull glm4
# Configure environment variables
export PROVIDER=ollama
export OPENAI_MODEL_NAME=glm4
export OLLAMA_BASE_URL=http://localhost:11434
# Run HiRAG with GLM4
python run_hirag.py --query "What are the key concepts in this document?"This model provides high-quality 3584-dimensional embeddings:
# Pull the embedding model
ollama pull rjmalagon/gte-qwen2-7b-instruct:f16
# Configure environment variables
export OLLAMA_EMBEDDING_MODEL=rjmalagon/gte-qwen2-7b-instruct:f16
# When running HiRAG, it will automatically use this model for embeddings
python run_hirag.py --query "What are the main themes?"You can fine-tune Ollama's behavior in your .env file:
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_TIMEOUT=60 # Seconds before timeout
OLLAMA_EMBEDDING_MODEL=rjmalagon/gte-qwen2-7b-instruct:f16
OLLAMA_EMBEDDING_DIM=3584
OLLAMA_CONCURRENCY=4 # Maximum concurrent requests
HiRAG supports two vector database backends:
- NanoVectorDB (default): Simpler and lightweight
- HNSWLib: More optimized for larger datasets
Use the --use-hnswlib flag to switch to HNSWLib.
HiRAG supports two graph storage backends:
- NetworkX (default): Stores graph data in local files
- Neo4j: Stores graph data in a Neo4j database
Use the --use-neo4j flag to switch to Neo4j storage.
You can customize how documents are chunked:
--chunk-size: Size of each chunk in tokens (default: 1200)--chunk-overlap: Overlap between consecutive chunks in tokens (default: 100)
You can customize embedding generation:
--embedding-batch: Number of texts to embed in a single batch (default: 32)--embedding-async: Maximum concurrent embedding function calls (default: 8)
You can customize graph clustering:
--max-cluster-size: Maximum number of clusters to create (default: 10)
You can choose between different RAG modes:
--naive-rag: Enable naive RAG mode (no hierarchical features)--no-hierarchical: Disable hierarchical mode
The implementation includes a robust text sanitization module to handle special characters and JSON parsing challenges:
- Character Escaping: Automatically escapes backslashes, quotes, newlines, and other special characters
- JSON Safety: Ensures all text is safe for inclusion in JSON structures
- Error Recovery: Handles common JSON parsing errors like missing commas
- Recursive Sanitization: Sanitizes all text fields in nested data structures
This is particularly important when working with the Cohere API, which may encounter issues with malformed JSON.
If you're using the Docker setup, run commands inside the container:
# Copy your document to the data directory first
cp sample_document.txt data/
# Run inside the container
docker exec -it hirag_hirag_1 python run_hirag.py --query "What is HiRAG?"The system supports several query modes:
hi: Full hierarchical retrieval (default)naive: Traditional RAG approachhi_nobridge: Hierarchical retrieval without the bridge layerhi_local: Using only local knowledgehi_global: Using only global knowledgehi_bridge: Using only bridge knowledge
Dockerfile: Container definition for running HiRAGdocker-compose.yml: Orchestration for HiRAG and Ollama servicesconfig.yaml: Configuration for the various models and parameters.env.example: Example environment variables filehi_rag_demo.py: Main implementation file demonstrating HiRAG usagerun_hirag.py: Combined script for indexing and querying in one steprun.sh: Convenient shell script for common operationstest_hirag.py: Unit tests for the HiRAG implementationsample_document.txt: Example document for indexing and queryingverify_setup.py: Script to verify and set up the environmentpipeline_integration.py: Script to integrate HI_RAG with the existing pipelinerun_pipeline.sh: Convenient shell script for pipeline integrationrun_cohere_pipeline.sh: Script for running the Cohere entity extraction pipelinetext_sanitizer.py: Module for ensuring text is properly escaped and safe for JSONmini_entity_extract.py: Extracts entities using Cohere APItest_sanitizer.py: Tests for the text sanitization functionalitytest_pipeline.py: Test script for the pipeline integrationcheck_sensitive_info.sh: Script to check for sensitive information before Git publishingREPOSITORY.md: Documentation specifically for the Git repository
The implementation leverages the original HiRAG codebase with custom configurations:
-
Model Providers:
- Ollama: Local models like GLM4 and rjmalagon/gte-qwen2-7b-instruct:f16
- Cohere: Entity extraction and embeddings
- DeepSeek: Chat and advanced LLM operations
- OpenAI/Azure: Optional providers for GPT models
-
Features:
- Hierarchical knowledge organization
- Entity-based retrieval
- Text sanitization and error handling
- Multiple storage options (NanoVectorDB, HNSWLib, Neo4j)
- Configurable chunking, embedding, and clustering
-
Pipeline Integration:
- Seamless connection with ingest, NER, and chunker components
- Comprehensive output consolidation
- Multi-provider workflow support
If you encounter an error like OSError: [Errno 30] Read-only file system: '/app', it means you're trying to use Docker paths in your local environment. The script has been updated to automatically detect and use local paths when needed.
If you see ModuleNotFoundError: No module named 'hirag', run the verification script:
python verify_setup.pyThis will install the HiRAG package and its dependencies.
If you see an error like AssertionError: Embedding dim mismatch, expected: 3584, but loaded: 1536, it means there's a mismatch between the configured embedding dimensions and the existing vector database. To fix this, use the --clean option:
# Using run_hirag.py directly
python run_hirag.py --query "What is HiRAG?" --clean
# Using the shell script
./run.sh -q "What is HiRAG?" -c
# Using the pipeline integration
./run_pipeline.sh -i ../ingest/outputs -n ../graph_ner/output -c ../rag_chunker/output --cleanThis will delete the existing vector database files and create a new one with the correct dimensions.
If you encounter JSON parsing errors with the Cohere API, the system now includes robust text sanitization:
- The text sanitizer automatically escapes special characters
- The JSON parser has recovery mechanisms for common errors
- Automatic retry logic is implemented for problematic chunks
If you encounter errors connecting to Neo4j, check:
- Neo4j service is running
- Credentials are correct
- Connection URL is correct (
neo4j://localhost:7687is the default) - Neo4j APOC plugin is installed (required for some graph algorithms)
If queries fail with various errors even after fixing other issues, make sure a document has been indexed first. The run_hirag.py script will automatically index a document if needed.
If you encounter errors with Ollama models:
-
Check if the model is pulled:
ollama list
-
Verify Ollama is running:
curl http://localhost:11434/api/tags
-
Check model dimensions: For embedding models, ensure the
OLLAMA_EMBEDDING_DIMmatches the model's dimensions (e.g., 3584 for rjmalagon/gte-qwen2-7b-instruct:f16)
Run the included tests to verify the implementation:
# Test HiRAG core functionality
python test_hirag.py
# Test pipeline integration
python test_pipeline.py
# Test text sanitization
python test_sanitizer.py