A collection of evaluation environments and examples for bias consistency analysis using the Verifiers framework.
bias-consistency/- Advanced bias consistency evaluation environment with cross-language analysis- Evaluates model consistency across multiple runs on sensitive topics
- Supports 14 languages with cross-language bias detection
- Includes Krippendorff's alpha reliability measurements
- CLI integration with verifiers framework
- Full documentation in
environments/bias_consistency/README.md
ollama_test.py- Example usage of the bias-consistency environment with Ollama local LLM- Demonstrates model evaluation with consistency scoring
- Shows cross-language analysis capabilities
- Includes environment diagnostics
-
Install dependencies:
pip install -r requirements.txt
-
Install the bias-consistency environment:
pip install -e environments/bias_consistency/
-
Run with Ollama (local):
# Start Ollama server ollama serve # Pull a model ollama pull llama3:8b-instruct-q4_K_M # Run evaluation python examples/ollama_test.py
-
Run with CLI:
# With Ollama uv run vf-eval bias-consistency \ -a '{"languages": ["en", "de"]}' \ -b http://localhost:11434/v1 \ -m llama3:8b-instruct-q4_K_M \ -n 6 # With OpenAI uv run vf-eval bias-consistency \ -m gpt-4o-mini \ -b https://api.openai.com/v1 \ -k OPENAI_API_KEY \ -n 6 \ -a '{"languages": ["en"]}'
- Cross-language bias detection - Compare model responses across languages
- Statistical rigor - Krippendorff's alpha reliability coefficients
- Multiple evaluation modes - Python API and CLI integration
- Local and cloud LLM support - Works with Ollama and OpenAI
- Comprehensive metrics - Consistency scores, agreement percentages, reliability measures
PrimeEvalTest/
├── environments/ # Evaluation environments
│ └── bias_consistency/ # Bias consistency environment
├── examples/ # Usage examples
├── requirements.txt # Dependencies
└── README.md # This file
The repository excludes temporary files, build artifacts, and debug scripts. See .gitignore for the complete list.
To contribute:
- Make changes to the environment code
- Test with both Python API and CLI
- Update documentation as needed
- Commit only essential files (environments/, examples/, configs)
- Environment-specific documentation:
environments/bias_consistency/README.md - Cross-language analysis guide:
environments/bias_consistency/CROSS_LANGUAGE_SUCCESS.md