PathBench is a comprehensive, multi-task, multi-organ benchmark designed for real-world clinical performance evaluation of pathology foundation models towards precision oncology. This interactive web platform provides standardized evaluation metrics and comparative analysis across 20+ state-of-the-art pathology foundation models.
PathBench addresses the critical need for standardized evaluation of pathology foundation models in clinical settings. Our benchmark encompasses:
- 20+ Foundation Models: Including UNI, Virchow, CONCH, Prov-GigaPath, CHIEF, and more
- Multi-organ Coverage: Breast, lung, colorectal, prostate, kidney, and other major organs
- Diverse Task Types: Classification, survival prediction (OS, DFS, DSS), and report generation
- Real Clinical Data: Performance evaluation on both internal and external cohorts
- Interactive Visualization: Comprehensive charts, heatmaps, and comparative analysis tools
- Traditional Models: ResNet50 baseline
- Vision Transformers: UNI, UNI2, Virchow, Virchow2, Prov-GigaPath
- Specialized Pathology Models: CONCH, CHIEF, Phikon, CTransPath
- Multi-modal Models: PLIP, MUSK
- Latest Models: H-Optimus, Hibou-L, GPFM, mSTAR
- IHC Marker Prediction: ER, PR, HER2, Ki67, CK5, and more
- Survival Analysis: Overall Survival (OS), Disease-Free Survival (DFS), Disease-Specific Survival (DSS)
- Histological Grading: Tumor grading and staging
- Performance Heatmaps: Ranking visualization across tasks and organs
- Comparative Charts: Side-by-side model performance comparison
- Statistical Analysis: Mean performance with confidence intervals
- Filtering & Search: Dynamic filtering by organ, task type, and metrics
- Node.js 18+
- npm or yarn package manager
# Clone the repository
git clone https://github.com/birkhoffkiki/PathBench.git
cd PathBench
# Install dependencies
npm install
# Start development server
npm run devThe application will be available at http://localhost:9000.
# Build the application (automatically generates performance cache)
npm run build
# Serve the built application
npm startPathBench uses a pre-computed aggregated cache system to ensure fast filter interactions:
# Manually regenerate performance cache (if data updated)
npm run generate-cacheThe cache is automatically regenerated during the build process. See docs/CACHE_OPTIMIZATION.md for details.
Performance improvements:
- π 10-20x faster filter interactions
- β‘ <100ms response time (vs 1-2s before)
- π Handles 107k+ performance records efficiently
PathBench/
βββ src/
β βββ app/ # Next.js app router
β βββ components/ # React components
β β βββ charts/ # Visualization components
β β βββ tables/ # Data table components
β β βββ filters/ # Filter controls
β β βββ ui/ # UI components
β βββ data/ # Data files and utilities
β β βββ models.json # Model metadata
β β βββ performance.json # Performance metrics
β β βββ tasks.ts # Task definitions
β βββ types/ # TypeScript type definitions
β βββ lib/ # Utility functions
βββ public/ # Static assets
βββ scripts/ # Build scripts
Each model entry includes:
- Basic Info: Name, citation, publication venue
- Architecture: Model architecture and parameters
- Training Data: Pretraining strategy and data sources
- Specifications: Number of slides, patches, and parameters
Performance data includes:
- Task Identification: Unique task IDs and descriptions
- Organ Classification: Target organ systems
- Cohort Information: Internal vs. external validation
- Metrics: AUC, C-Index, BLEU scores with k-fold cross-validation results
- Frontend: Next.js 15, React 18, TypeScript
- Styling: Tailwind CSS, Radix UI components
- Visualization: ECharts, D3.js
- Deployment: GitHub Pages, Netlify
- Build Tools: Turbopack, PostCSS
- Overview Tab: General statistics and model rankings
- Performance Tab: Detailed performance analysis by task
- Models Tab: Comprehensive model information and specifications
- Model Filter: Select specific models for comparison
- Task Type Filter: Focus on classification, survival, or generation tasks
- Organ Filter: Analyze performance by organ system
- Metric Selector: Choose evaluation metrics (AUC, C-Index, BLEU)
If you submit a model through PrePath, the evaluation follows a two-stage process:
- Selected Cohorts First: Your model will be initially evaluated on the curated "Selected Cohorts" - a representative subset of key validation tasks
- Full Leaderboard for Top Models: Only models ranking in the top 5 on Selected Cohorts will proceed to full evaluation across all validation cohorts
This staged approach ensures our efficient resource utilization while maintaining comprehensive benchmarking for leading models.
- Heatmaps: Color-coded performance rankings
- Bar Charts: Comparative performance with error bars
- Pie Charts: Data distribution visualization
- Interactive Tables: Sortable and filterable data tables
This work is based on our research paper:
@article{ma2025pathbench,
title={PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology},
author={Ma, Jiabo and Xu, Yingxue and Zhou, Fengtao and Wang, Yihui and Jin, Cheng and Guo, Zhengrui and Wu, Jianfeng and Tang, On Ki and Zhou, Huajun and Wang, Xi and Luo, Luyang and Zhang, Zhengyu and Cai, Du and Gao, Zizhao and Wang, Wei and Liu, Yueping and He, Jiankun and Cui, Jing and Li, Zhenhui and Zhang, Jing and Gao, Feng and Zhang, Xiuming and Liang, Li and Chan, Ronald Cheong Kin and Wang, Zhe and Chen, Hao},
journal={arXiv preprint arXiv:2505.20202},
year={2025}
}We welcome contributions to PathBench! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This frontend code can be served as a static website for any leaderboard. To add new tasks to the benchmark:
- Update
src/data/tasks.tswith task metadata - Add performance data to
src/data/performance.json - Ensure proper model mapping in
src/data/models.json
This project is licensed under the MIT License - see the LICENSE file for details.
- Need instantaneous support? Please open GitHub Issues: Create an issue
- Feeling academic? Please cite our Paper: arXiv:2505.20202
- Want to see it in action? Please visit our Demo: Live Application
For inquiries regarding institutional collaborations, model benchmarking, or dataset contributions, please contact [email protected].
For technical support, website development inquiries, or platform enhancement suggestions, please reach out at [email protected].