Official implementation of ICLR2024 Oral Unified Generative Modeling of 3D Molecules with Bayesian Flow Networks
Please refer to our recent work for applying GeoBFN on Structure-based Drug Design(SBDD) at MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space (ICML2024) with code available at https://github.com/AlgoMole/MolCRAFT.
You will need to have a host machine with gpu, and have a docker with nvidia-container-runtime enabled.
Tip
- This repo provide an easy to use script to install docker and nvidia-container-runtime, in
./GeoBFN/dockerrunsudo ./setup_docker_for_host.shto setup your host machine. - You can also refer to install guide if you don't have them installed.
Clone the repo with git clone,
git clone https://github.com/AlgoMole/GeoBFN.gitsetup environment with docker,
cd ./GeoBFN/docker
make # a make is all you needNote
-
The
makewill automatically build the docker image and run the container. with your host home directory mounted to the${HOME}/homedirectory inside the container. highly recommended -
If you need to setup the environment manually, please refer to files
docker/Dockerfile,docker/asset/requirements.txtanddocker/asset/apt_packages.txt.
Inside container, find path to your repo. inside GeoBFN/ run
make -f train.mkNote
- this command will automatically attempt to download dataset if not exist, and run training script
python geobfn_train.py --config_file configs/bfn4molgen.yaml --epochs 3000on a default gpu, if you want to change the default gpu, you runexport CUDA_VISIBLE_DEVICES=<gpu_id>before themake -f train.mkcommand. - Comment/delete the
--no_wandboption intrain.mkif you want to use wandb to log the training process. You probably will be prompted to enter your wandb api key.
Caution
-
You could encounter connection error if your server is in China, you can manually download the dataset from baidu netdisk and put it in
./GeoBFNdirectory withscp <path/to/local/qm9.tar.gz> <username>@<remotehost>:<path/to/remote/GeoBFN/>. run the script block again after the dataset is downloaded. -
Alternatively you can use a proxy to alow the script download the dataset automatically.
Tip
-
Better run the training command inside a tmux session, as it takes long time to finish training.
-
exiting from container wound't stop the container, run
makefrom host atGeoBFN/dockerto log in the running container again. if really need to kill the container runmake killfromGeoBFN/docker.
If you find the idea or code useful for your research, please consider citing
@article{song2024unified,
title={Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks},
author={Song, Yuxuan and Gong, Jingjing and Qu, Yanru and Zhou, Hao and Zheng, Mingyue and Liu, Jingjing and Ma, Wei-Ying},
journal={arXiv preprint arXiv:2403.15441},
year={2024}}