Skip to content

KrishnaswamyLab/ImmunoStruct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImmunoStruct

nature bioRxiv Python PyTorch Twitter Follow GitHub Stars

ImmunoStruct enables multimodal deep learning for immunogenicity prediction

Table of Contents
  1. News
  2. About The Project
  3. Citation
  4. Getting Started
  5. Usage
  6. Model Architecture
  7. Troubleshooting
  8. Contributing
  9. License
  10. Contact
  11. Acknowledgments

News

☐ TODO: create and release an end-to-end tool.
✅ Dec 31, 2025: Published in Nature Machine Intelligence.
✅ Dec 04, 2025: Informally presented at NeurIPS 2025 (did not submit, no dual-submission concern).
✅ Aug 18, 2025: Received the Colton Innovation Fund from Colton Center for Autoimmunity at Yale University.
✅ May 06, 2025: Submitted to Nature Machine Intelligence.
✅ Nov 05, 2024: Presented at MoML@MIT 2024 (non-archival abstract & poster).
✅ Nov 01, 2024: Preprint released.

About The Project

ImmunoStruct Architecture

ImmunoStruct is a multimodal deep learning framework that integrates sequence, structural, and biochemical information to predict multi-allele class-I peptide-MHC immunogenicity. By leveraging multimodal data from 26,049 peptide-MHCs and jointly modeling sequence and structure, ImmunoStruct significantly improves immunogenicity prediction performance for both infectious disease epitopes and cancer neoepitopes.

(back to top)

Key Features

  • Multimodal Integration: Combines peptide-MHC protein sequence, structure, and biochemical properties
  • Novel Cancer-Wildtype Contrastive Learning: Enhances specificity for cancer neoepitope detection
  • Enhanced Interpretability: Provides insights into the substructural basis of immunogenicity
Contrastive Learning Approach

(back to top)

Citation

If you use ImmunoStruct in your research, please cite our paper:

BibTeX:

@article{givechian2025immunostruct,
  title={ImmunoStruct enables multimodal deep learning for immunogenicity prediction},
  author={Givechian, Kevin Bijan and Rocha, Jo{\~a}o Felipe and Liu, Chen and Yang, Edward and Tyagi, Sidharth and Greene, Kerrie and Ying, Rex and Caron, Etienne and Iwasaki, Akiko and Krishnaswamy, Smita},
  journal={Nature Machine Intelligence},
  pages={1--14},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

Nature format:
Givechian, K.B., Rocha, J.F., Liu, C. et al. ImmunoStruct enables multimodal deep learning for immunogenicity prediction. Nat Mach Intell (2025). https://doi.org/10.1038/s42256-025-01163-y

(back to top)

Getting Started

To get ImmunoStruct up and running locally, follow these steps.

Pre-requisites

Before installation, ensure you have:

  • Python 3.10+
  • CUDA-compatible GPU (recommended)
  • Conda package manager
  • Weights & Biases account for experiment tracking

Dependencies

  • python 3.10
  • torch 2.1.2
  • dgl
  • torch_geometric 2.5.3

Installation

  1. Clone the repository

    git clone https://github.com/KrishnaswamyLab/ImmunoStruct.git
    cd ImmunoStruct
  2. Create and activate conda environment

    conda create --name immuno python=3.10 -c anaconda -c conda-forge
    conda activate immuno
  3. Install core dependencies

    conda install cudatoolkit=11.2 wandb pydantic -c conda-forge
    conda install scikit-image pillow matplotlib seaborn tqdm -c anaconda
  4. Install PyTorch

    python -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
  5. Install DGL

    python -m pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/cu118/repo.html
    python -m pip install torchdata==0.7.1
  6. Install PyTorch Geometric and related packages

    python -m pip install torch-scatter==2.1.2+pt21cu118 torch-sparse==0.6.18+pt21cu118 torch-cluster==1.6.3+pt21cu118 torch-spline-conv==1.2.2+pt21cu118 torch_geometric==2.5.3 numpy==1.26.3 -f https://data.pyg.org/whl/torch-2.1.2+cu118.html
  7. Install additional packages

    python -m pip install graphein[extras]
    python -m pip install lifelines
    python -m pip install -U phate
    python -m pip install multiscale-phate
  8. Set up environment variables (if needed)

    export LD_LIBRARY_PATH=/path/to/conda/envs/immuno/lib:$LD_LIBRARY_PATH

(back to top)

Usage

Data Preparation

Place the following files in the data/ folder:

  • cedar_data_final_with_mprop1_mprop2_v2.txt
  • complete_score_Mprops_1_2_smoothed_sasa_v2.txt
  • HLA_27_seqs_csv.csv

Additionally, ensure you have these folders:

  • graph_pyg_Cancer
  • graph_pyg_IEDB

Generate PyG graph files:

These PyG graph files can be generated using the below command from the corresponding AlphaFold folders.

python immunostruct/preprocessing/cancer_graph_construction_new_KBG.py

Training and Testing

  1. Set up Weights & Biases

    Create a project on Weights & Biases matching your project name.

  2. Run Experiments

    # HybridModelv2 with full sequence and sequence loss
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model HybridModelv2 --wandb-username YOUR_WANDB_USERNAME
    
    # HybridModel with full sequence and sequence loss
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model HybridModel --wandb-username YOUR_WANDB_USERNAME
    
    # Sequence with fingerprint model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model SequenceFpModel --wandb-username YOUR_WANDB_USERNAME
    
    # Sequence-only model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model SequenceModel --wandb-username YOUR_WANDB_USERNAME
    
    # Structure-only model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --model StructureModel --wandb-username YOUR_WANDB_USERNAME

(back to top)

Troubleshooting

Common Issues

GLIBCXX Error

ImportError: $some_path/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found

Solution: Add your conda environment path to LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=/path/to/conda/envs/immuno/lib:$LD_LIBRARY_PATH

CUDA Compatibility Issues

  • Ensure your CUDA version matches the PyTorch installation
  • Verify GPU availability with torch.cuda.is_available()

Memory Issues

  • Reduce batch size in training scripts
  • Use gradient checkpointing for large models

Wandb Authentication

  • Login to Wandb: wandb login
  • Ensure project names match between script and Wandb dashboard

(back to top)

License

Distributed under the Yale License. See LICENSE.txt for more information.

(back to top)

Contact

Krishnaswamy Lab - @KrishnaswamyLab

Project Link: https://github.com/KrishnaswamyLab/ImmunoStruct

(back to top)