PathBench: A Comprehensive Benchmark for Pathology Foundation Models

PathBench is a comprehensive, multi-task, multi-organ benchmark designed for real-world clinical performance evaluation of pathology foundation models towards precision oncology. This interactive web platform provides standardized evaluation metrics and comparative analysis across 20+ state-of-the-art pathology foundation models.

🎯 Overview

PathBench addresses the critical need for standardized evaluation of pathology foundation models in clinical settings. Our benchmark encompasses:

20+ Foundation Models: Including UNI, Virchow, CONCH, Prov-GigaPath, CHIEF, and more
Multi-organ Coverage: Breast, lung, colorectal, prostate, kidney, and other major organs
Diverse Task Types: Classification, survival prediction (OS, DFS, DSS), and report generation
Real Clinical Data: Performance evaluation on both internal and external cohorts
Interactive Visualization: Comprehensive charts, heatmaps, and comparative analysis tools

📊 Key Features

🔬 Comprehensive Model Coverage

Traditional Models: ResNet50 baseline
Vision Transformers: UNI, UNI2, Virchow, Virchow2, Prov-GigaPath
Specialized Pathology Models: CONCH, CHIEF, Phikon, CTransPath
Multi-modal Models: PLIP, MUSK
Latest Models: H-Optimus, Hibou-L, GPFM, mSTAR

🏥 Clinical Task Evaluation

IHC Marker Prediction: ER, PR, HER2, Ki67, CK5, and more
Survival Analysis: Overall Survival (OS), Disease-Free Survival (DFS), Disease-Specific Survival (DSS)
Histological Grading: Tumor grading and staging

📈 Interactive Analytics

Performance Heatmaps: Ranking visualization across tasks and organs
Comparative Charts: Side-by-side model performance comparison
Statistical Analysis: Mean performance with confidence intervals
Filtering & Search: Dynamic filtering by organ, task type, and metrics

🚀 Getting Started

Prerequisites

Node.js 18+
npm or yarn package manager

Installation

# Clone the repository
git clone https://github.com/birkhoffkiki/PathBench.git
cd PathBench

# Install dependencies
npm install

# Start development server
npm run dev

The application will be available at http://localhost:9000.

Building for Production

# Build the application (automatically generates performance cache)
npm run build

# Serve the built application
npm start

Performance Optimization

PathBench uses a pre-computed aggregated cache system to ensure fast filter interactions:

# Manually regenerate performance cache (if data updated)
npm run generate-cache

The cache is automatically regenerated during the build process. See docs/CACHE_OPTIMIZATION.md for details.

Performance improvements:

🚀 10-20x faster filter interactions
⚡ <100ms response time (vs 1-2s before)
📊 Handles 107k+ performance records efficiently

🏗️ Project Structure

PathBench/
├── src/
│   ├── app/                    # Next.js app router
│   ├── components/             # React components
│   │   ├── charts/            # Visualization components
│   │   ├── tables/            # Data table components
│   │   ├── filters/           # Filter controls
│   │   └── ui/                # UI components
│   ├── data/                  # Data files and utilities
│   │   ├── models.json        # Model metadata
│   │   ├── performance.json   # Performance metrics
│   │   └── tasks.ts           # Task definitions
│   ├── types/                 # TypeScript type definitions
│   └── lib/                   # Utility functions
├── public/                    # Static assets
└── scripts/                   # Build scripts

📋 Data Structure

Models

Each model entry includes:

Basic Info: Name, citation, publication venue
Architecture: Model architecture and parameters
Training Data: Pretraining strategy and data sources
Specifications: Number of slides, patches, and parameters

Performance Metrics

Performance data includes:

Task Identification: Unique task IDs and descriptions
Organ Classification: Target organ systems
Cohort Information: Internal vs. external validation
Metrics: AUC, C-Index, BLEU scores with k-fold cross-validation results

🎨 Technology Stack

Frontend: Next.js 15, React 18, TypeScript
Styling: Tailwind CSS, Radix UI components
Visualization: ECharts, D3.js
Deployment: GitHub Pages, Netlify
Build Tools: Turbopack, PostCSS

📖 Usage Guide

Navigation

Overview Tab: General statistics and model rankings
Performance Tab: Detailed performance analysis by task
Models Tab: Comprehensive model information and specifications

Filtering Options

Model Filter: Select specific models for comparison
Task Type Filter: Focus on classification, survival, or generation tasks
Organ Filter: Analyze performance by organ system
Metric Selector: Choose evaluation metrics (AUC, C-Index, BLEU)

Model Submission & Evaluation Process

If you submit a model through PrePath, the evaluation follows a two-stage process:

Selected Cohorts First: Your model will be initially evaluated on the curated "Selected Cohorts" - a representative subset of key validation tasks
Full Leaderboard for Top Models: Only models ranking in the top 5 on Selected Cohorts will proceed to full evaluation across all validation cohorts

This staged approach ensures our efficient resource utilization while maintaining comprehensive benchmarking for leading models.

Visualization Features

Heatmaps: Color-coded performance rankings
Bar Charts: Comparative performance with error bars
Pie Charts: Data distribution visualization
Interactive Tables: Sortable and filterable data tables

🔬 Research & Citation

This work is based on our research paper:

@article{ma2025pathbench,
      title={PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology},
      author={Ma, Jiabo and Xu, Yingxue and Zhou, Fengtao and Wang, Yihui and Jin, Cheng and Guo, Zhengrui and Wu, Jianfeng and Tang, On Ki and Zhou, Huajun and Wang, Xi and Luo, Luyang and Zhang, Zhengyu and Cai, Du and Gao, Zizhao and Wang, Wei and Liu, Yueping and He, Jiankun and Cui, Jing and Li, Zhenhui and Zhang, Jing and Gao, Feng and Zhang, Xiuming and Liang, Li and Chan, Ronald Cheong Kin and Wang, Zhe and Chen, Hao},
      journal={arXiv preprint arXiv:2505.20202},
      year={2025}
    }

🤝 Contributing

We welcome contributions to PathBench! Please see our contributing guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Adding New Model Results and Tasks

This frontend code can be served as a static website for any leaderboard. To add new tasks to the benchmark:

Update src/data/tasks.ts with task metadata
Add performance data to src/data/performance.json
Ensure proper model mapping in src/data/models.json

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Contact

Need instantaneous support? Please open GitHub Issues: Create an issue
Feeling academic? Please cite our Paper: arXiv:2505.20202
Want to see it in action? Please visit our Demo: Live Application

For inquiries regarding institutional collaborations, model benchmarking, or dataset contributions, please contact [email protected].

For technical support, website development inquiries, or platform enhancement suggestions, please reach out at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
public		public
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
global.d.ts		global.d.ts
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PathBench: A Comprehensive Benchmark for Pathology Foundation Models

🎯 Overview

📊 Key Features

🔬 Comprehensive Model Coverage

🏥 Clinical Task Evaluation

📈 Interactive Analytics

🚀 Getting Started

Prerequisites

Installation

Building for Production

Performance Optimization

🏗️ Project Structure

📋 Data Structure

Models

Performance Metrics

🎨 Technology Stack

📖 Usage Guide

Navigation

Filtering Options

Model Submission & Evaluation Process

Visualization Features

🔬 Research & Citation

🤝 Contributing

Adding New Model Results and Tasks

📄 License

📞 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

birkhoffkiki/PathBench

Folders and files

Latest commit

History

Repository files navigation

PathBench: A Comprehensive Benchmark for Pathology Foundation Models

🎯 Overview

📊 Key Features

🔬 Comprehensive Model Coverage

🏥 Clinical Task Evaluation

📈 Interactive Analytics

🚀 Getting Started

Prerequisites

Installation

Building for Production

Performance Optimization

🏗️ Project Structure

📋 Data Structure

Models

Performance Metrics

🎨 Technology Stack

📖 Usage Guide

Navigation

Filtering Options

Model Submission & Evaluation Process

Visualization Features

🔬 Research & Citation

🤝 Contributing

Adding New Model Results and Tasks

📄 License

📞 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages