Comprehensive benchmarking tools and RAG examples for the SAGE framework
SAGE Benchmark provides a comprehensive suite of benchmarking tools and RAG (Retrieval-Augmented Generation) examples for evaluating SAGE framework performance. This package enables researchers and developers to:
- Benchmark RAG pipelines with multiple retrieval strategies (dense, sparse, hybrid)
- Compare vector databases (Milvus, ChromaDB, FAISS) for RAG applications
- Evaluate multimodal retrieval with text, image, and video data
- Run reproducible experiments with standardized configurations and metrics
This package is designed for both research experiments and production system evaluation.
- Multiple RAG Implementations: Dense, sparse, hybrid, and multimodal retrieval
- Vector Database Support: Milvus, ChromaDB, FAISS integration
- Experiment Framework: Automated benchmarking with configurable experiments
- Evaluation Metrics: Comprehensive metrics for RAG performance
- Sample Data: Included test data for quick start
- Extensible Design: Easy to add new benchmarks and retrieval methods
sage-benchmark/
โโโ src/
โ โโโ sage/
โ โโโ benchmark/
โ โโโ __init__.py
โ โโโ benchmark_rag/ # RAG benchmarking
โ โโโ __init__.py
โ โโโ implementations/ # RAG implementations
โ โ โโโ pipelines/ # RAG pipeline scripts
โ โ โ โโโ qa_dense_retrieval_milvus.py
โ โ โ โโโ qa_sparse_retrieval_milvus.py
โ โ โ โโโ qa_multimodal_fusion.py
โ โ โ โโโ ...
โ โ โโโ tools/ # Supporting tools
โ โ โโโ build_chroma_index.py
โ โ โโโ build_milvus_dense_index.py
โ โ โโโ loaders/
โ โโโ evaluation/ # Experiment framework
โ โ โโโ pipeline_experiment.py
โ โ โโโ evaluate_results.py
โ โ โโโ config/
โ โโโ config/ # RAG configurations
โ โโโ data/ # Test data
โ # Future benchmarks:
โ # โโโ benchmark_agent/ # Agent benchmarking
โ # โโโ benchmark_anns/ # ANNS benchmarking
โโโ tests/
โโโ pyproject.toml
โโโ README.md
Install the benchmark package:
pip install -e packages/sage-benchmarkOr with development dependencies:
pip install -e "packages/sage-benchmark[dev]"Note: The sage.data module is included as a submodule in the package and will be installed
automatically. It contains datasets for various benchmarks including LibAMM datasets.
The benchmark_rag module provides comprehensive RAG benchmarking capabilities:
Various RAG approaches for performance comparison:
Vector Databases:
- Milvus: Dense, sparse, and hybrid retrieval
- ChromaDB: Local vector database with simple setup
- FAISS: Efficient similarity search
Retrieval Methods:
- Dense retrieval (embeddings-based)
- Sparse retrieval (BM25, sparse vectors)
- Hybrid retrieval (combining dense + sparse)
- Multimodal fusion (text + image + video)
First, prepare your vector index:
# Build ChromaDB index (simplest)
python -m sage.benchmark.benchmark_rag.implementations.tools.build_chroma_index
# Or build Milvus dense index
python -m sage.benchmark.benchmark_rag.implementations.tools.build_milvus_dense_indexTest individual RAG pipelines:
# Dense retrieval with Milvus
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_dense_retrieval_milvus
# Sparse retrieval
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_sparse_retrieval_milvus
# Hybrid retrieval (dense + sparse)
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_hybrid_retrieval_milvusExecute full benchmark suite:
# Run comprehensive benchmark
python -m sage.benchmark.benchmark_rag.evaluation.pipeline_experiment
# Evaluate and generate reports
python -m sage.benchmark.benchmark_rag.evaluation.evaluate_resultsResults are saved in benchmark_results/:
experiment_TIMESTAMP/- Individual experiment runsmetrics.json- Performance metricscomparison_report.md- Comparison report
from sage.benchmark.benchmark_rag.implementations.pipelines import (
qa_dense_retrieval_milvus,
)
from sage.benchmark.benchmark_rag.config import load_config
# Load configuration
config = load_config("config_dense_milvus.yaml")
# Run RAG pipeline
results = qa_dense_retrieval_milvus.run_pipeline(query="What is SAGE?", config=config)
# View results
print(f"Retrieved {len(results)} documents")
for doc in results:
print(f"- {doc.content[:100]}...")from sage.benchmark.benchmark_rag.evaluation import PipelineExperiment
# Define experiment configuration
experiment = PipelineExperiment(
name="custom_rag_benchmark",
pipelines=["dense", "sparse", "hybrid"],
queries=["query1.txt", "query2.txt"],
metrics=["precision", "recall", "latency"],
)
# Run experiment
results = experiment.run()
# Generate report
experiment.generate_report(results)Configuration files are located in sage/benchmark/benchmark_rag/config/:
config_dense_milvus.yaml- Dense retrieval configurationconfig_sparse_milvus.yaml- Sparse retrieval configurationconfig_hybrid_milvus.yaml- Hybrid retrieval configurationconfig_qa_chroma.yaml- ChromaDB configuration
Experiment configurations in sage/benchmark/benchmark_rag/evaluation/config/:
experiment_config.yaml- Benchmark experiment settings
Test data is included in the package:
-
Benchmark Data (
benchmark_rag/data/):queries.jsonl- Sample queries for testingqa_knowledge_base.*- Knowledge base in multiple formats (txt, md, pdf, docx)sample/- Additional sample documents for testingsample/- Additional sample documents
-
Benchmark Config (
benchmark_rag/config/):experiment_config.yaml- RAG benchmark configurations
pytest packages/sage-benchmark/# Format code
black packages/sage-benchmark/
# Lint code
ruff check packages/sage-benchmark/For detailed documentation on each component:
- See
src/sage/benchmark/rag/README.mdfor RAG examples - See
src/sage/benchmark/benchmark_rag/README.mdfor benchmark details
- benchmark_agent: Agent system performance benchmarking
- benchmark_anns: Approximate Nearest Neighbor Search benchmarking
- benchmark_llm: LLM inference performance benchmarking
This package follows the same contribution guidelines as the main SAGE project. See the main
repository's CONTRIBUTING.md.
This project is licensed under the MIT License - see the LICENSE file for details.
- sage-kernel: Core computation engine for running benchmarks
- sage-libs: RAG components and utilities
- sage-middleware: Vector database services (Milvus, ChromaDB)
- sage-common: Common utilities and data types
- Documentation: https://intellistream.github.io/SAGE-Pub/guides/packages/sage-benchmark/
- Issues: https://github.com/intellistream/SAGE/issues
- Discussions: https://github.com/intellistream/SAGE/discussions
Part of the SAGE Framework | Main Repository