Official implementation of Rationalized All-Atom Protein Design with Unified Multi-modal Bayesian Flow.
Table of Content
You can use our docker image to quickly set up the environment:
docker pull hanlinwu/probayes
Next, download the data following the instructions here.
You can also build the environment if docker is not available. Run the following script to install most of the packages.
conda env create -f probayes_env.yml
The PyRosetta version we use is pyrosetta-2024.35+release.45abd6a-cp310-cp310-linux_x86_64.whl.
You need to download this file into your server and install it:
pip install pyrosetta-2024.35+release.45abd6a-cp310-cp310-linux_x86_64.whl
pip install git+https://github.com/pandegroup/pdbfixer.git
pip install torch-scatter==2.1.2+pt20cu117 -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
pip install -e .
Download the pre-processed dataset files in this link. And unzip it in the project root.
Raw data can be found in PepBench&PepBDB and SAbDab&RAbD.
Download the pre-computed cache files for Bayesian flow in this link. And unzip it in the project root.
We provide our checkpoints and the designed PDB files for benchmark (PepBench, PepBDB, RAbD) evaluation here.
After installation, the project structure should be like:
/probayes
|-- README.md
|-- cache_files
|-- ckpts
|-- configs
|-- logs
|-- openfold
|-- probayes
|-- probayes.egg-info
|-- probayes_data
|-- probayes_data.zip
|-- probayes_env.yml
|-- remote
|-- scripts
|-- setup.py
|-- train.py
|-- train_antibody.py
|-- train_antibody_ddp.py
|-- train_pep.py
`-- train_pep_ddp.py
Now you can reimplement our benchmark metrics.
You may need to add the execuation permission for DockQ evaluation. e.g.
chmod +x probayes/remote/PepGLAD/evaluation/DockQ/fnat
chmod +x probayes/remote/ppflow/bin/TMscore/TMscore
And compile the TMScore.cpp
g++ -static -O3 -ffast-math -lm -o evaluation/TMscore evaluation/TMscore.cpp
All training and evaluation scripts can be found in scripts/. For reimplementing the benchmark metric scores:
- Peptide codesign
source scripts/eval_ckpt_peptide.sh
- Peptide Binding Conformation Generation / Folding
source scripts/eval_ckpt_folding.sh
- Antibody design
source scripts/eval_ckpt_antibody.sh
You can choose the desired dataset by switching the CKPT_DIR variable in the bash file.
We provide our training scripts here:
- Peptide codesign
source scripts/train_ddp_antibody_codesign.sh -d pepbench
- Peptide Binding Conformation Generation / Folding
source scripts/train_ddp_pep_folding.sh -d pepbench
- Antibody design
source scripts/train_ddp_antibody_codesign.sh
The default setting requires 4x80GB GPUs for 10~24 hours.
You can check the benchmark scores in wandb.
We would like to express our gratitude to the following repositories for their valuable contributions: