Hist2ST: Spatial Transcriptomics Prediction from Histology

Transformer + GNN model to predict spatial gene expression from H&E histology. Original project authors: Yuansong Zeng, Zhuoyi Wei, Weijiang Yu, Rui Yin, Bingling Li, Zhonghui Tang, Yutong Lu, Yuedong Yang.

This repository adds a robust, universal pipeline to run zero-shot inference on external HEST data and a complete downstream analysis workflow.

Overview

Model: Hist2ST combines CNN/Transformer for global context and GNN for local spatial structure; predicts gene expression (ZINB/NB heads).
Our contribution: A universal, sample-agnostic inference + analysis pipeline for HEST data with clean outputs and shell wrappers.
Upstream usage: Training/tutorial notebooks from the original repo are kept for reference.

Quick Start (External HEST Inference)

# 1) Ensure data + model are present
#    data/hest_data/st/{SAMPLE_ID}.h5ad
#    data/hest_data/wsis/{SAMPLE_ID}.tif
#    model/5-Hist2ST.ckpt

# 2) Make scripts executable (first time only)
chmod +x run_prediction.sh run_analysis.sh

# 3) Run prediction and analysis
./run_prediction.sh MEND159
./run_analysis.sh MEND159

Input layout

data/hest_data/
├── st/{SAMPLE_ID}.h5ad          # counts + spatial
└── wsis/{SAMPLE_ID}.tif         # H&E fallback image

model/
└── 5-Hist2ST.ckpt               # pretrained weights

Optional gene list: data/her_hvg_cut_1000.npy (first 785 used if present).

Output layout

output/{SAMPLE_ID}/
├── predictions/
│   ├── {SAMPLE_ID}_pred.h5ad         # 785-gene predictions
│   └── correlation_results.npy       # Pearson/Spearman + overlap genes
├── analysis/
│   ├── {SAMPLE_ID}_analyzed.h5ad     # processed AnnData
│   ├── clustering_results.csv
│   └── marker_genes.csv
├── visualizations/                   # UMAP/t-SNE/spatial plots
└── logs/                             # pipeline logs

Commands and scripts

run_prediction.sh SAMPLE_ID — shell wrapper for inference
run_analysis.sh SAMPLE_ID — shell wrapper for downstream analysis
predict_hest_universal.py — universal prediction (loads .h5ad + .tif, builds KNN graph, runs Hist2ST)
analyze_hest_universal.py — QC, HVG, PCA/UMAP/t-SNE, clustering, DE, spatial plots

Advanced (Python flags):

python predict_hest_universal.py SAMPLE_ID \
  --device auto --data_dir data/hest_data --output_dir output

Technical notes

Config: 5-7-2-8-4-16-32, n_genes=785, dropout=0.2
Weights: loaded with strict=False to allow partial compatibility
Graph: k=6; dynamically switches pruneTag (Grid/NA) by coordinate range
Coordinates: normalized to integer indices (0–63) for embeddings
Seeds fixed (12000) for reproducibility

Minimal model usage (reference)

import torch
from HIST2ST import Hist2ST

model = Hist2ST(depth1=2, depth2=8, depth3=4,
                n_genes=785, kernel_size=5, patch_size=7,
                heads=16, channel=32, dropout=0.2,
                zinb=0.25, nb=False, bake=5, lamb=0.5)
# patches: [B, N, 3, H, W]
# coords:  [B, N, 2] (long indices 0..63)
# adj:     [N, N]
# out:     [B, N, n_genes]

Requirements

Python >= 3.7, PyTorch >= 1.10, pytorch-lightning >= 1.4, scanpy >= 1.8, scipy, PIL, tqdm

Troubleshooting

"Pre-trained model not found": put 5-Hist2ST.ckpt under model/
"No overlapping genes": confirm .npy gene list or remove it to use dataset genes
Very low correlations: expected in zero-shot cross-dataset; predictions can still be useful
PIL DecompressionBombWarning: safe for large WSIs

Datasets (upstream)

HER2+ breast tumor ST: https://github.com/almaan/her2st
cSCC 10x Visium (GSE144240)
Synapse mirror of trained models and data indices (see upstream paper)

Citation (upstream)

Please cite the original authors:

@article{zengys,
  title={Spatial Transcriptomics Prediction from Histology jointly through Transformer and Graph Neural Networks},
  author={Yuansong Zeng and Zhuoyi Wei and Weijiang Yu and Rui Yin and Bingling Li and Zhonghui Tang and Yutong Lu and Yuedong Yang},
  journal={bioRxiv},
  year={2021},
  publisher={Cold Spring Harbor Laboratory}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
model		model
.gitignore		.gitignore
HIST2ST.py		HIST2ST.py
HIST2ST_train.py		HIST2ST_train.py
NB_module.py		NB_module.py
README.md		README.md
Workflow.png		Workflow.png
analyze_hest_universal.py		analyze_hest_universal.py
dataset.py		dataset.py
gcn.py		gcn.py
graph_construction.py		graph_construction.py
predict.py		predict.py
predict_hest_universal.py		predict_hest_universal.py
run_analysis.sh		run_analysis.sh
run_prediction.sh		run_prediction.sh
run_trained_models.ipynb		run_trained_models.ipynb
test_model_forward.py		test_model_forward.py
test_new_sample.sh		test_new_sample.sh
transformer.py		transformer.py
tutorial.ipynb		tutorial.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hist2ST: Spatial Transcriptomics Prediction from Histology

Overview

Quick Start (External HEST Inference)

Input layout

Output layout

Commands and scripts

Technical notes

Minimal model usage (reference)

Requirements

Troubleshooting

Datasets (upstream)

Citation (upstream)

About

Uh oh!

Releases

Packages

Languages

jasperyeoh/hist2st-external-inference

Folders and files

Latest commit

History

Repository files navigation

Hist2ST: Spatial Transcriptomics Prediction from Histology

Overview

Quick Start (External HEST Inference)

Input layout

Output layout

Commands and scripts

Technical notes

Minimal model usage (reference)

Requirements

Troubleshooting

Datasets (upstream)

Citation (upstream)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages