Contains the reproduction details for the publication on the wall-time efficient and robust optimal transport Gaussian Process dimer.
Third in a series of papers [1, 2] on the Dimer method with a focus on the failure modes of the Gaussian Process Regression model on the 500 system benchmark of Hermes et. al. [3].
If you use this repository or its parts please cite the corresponding publication or data source.
Submitted to ChemPhysChem.
A more accessible form of the same publication is:
Submitted to the Materials Cloud Archive.Remember to inflate the data using the materialscloud source (section ref:mca) before using the scripts in the repository. Assuming that the .xz files have been downloaded to data relative to the repository root:
export GITROOT=$(git rev-parse --show-toplevel)
cd $GITROOT/data
tar -xf otgpd_alldat.tar.xz && rm -rf otgpd_alldat.tar.xz
# Raw benchmark data, i.e., EON output logs
cp $GITROOT/data/hpc.tar.xz $GITROOT/bench_runs/runs/hpc
cd $GITROOT/bench_runs/runs/hpc
tar -xf hpc.tar.xz && rm -rf hpc.tar.xzThe repository has code archives, benchmark runs, and scripts for analysis.
❯ tree -L 2
.
├── CODEOWNERS
├── docs
│ ├── 00_freeform.org
│ ├── 01_hpc.org
│ ├── 03_suppl_viz.org
│ ├── 04_models.org
│ └── 05_suppl_py.org
├── LICENSE
├── pixi.lock
├── pixi.toml
├── readme.org
├── runs
│ ├── automated
│ ├── calc_rundata.py
│ ├── init_condcheck.py
│ └── run_pf.py
├── scripts
│ ├── build_nwchem.sh
│ └── env_setup.sh
└── subrepos
├── chemparseplot
├── eOn
├── gpr_optim
├── IterativeRotationsAssignments
├── readme.org
└── rgpycrumbsWhere the data in the archives expands to locations within the benchmarks.
Each of the benchmarks consists of the following structure:
.
├── doublet
│ ├── 000
# .....
│ └── 234
└── singlet
│ ├── 000
# .....
└── 264Comprising of 500 systems.
For comparisons:
- GPDimer runs
- Extract from the relevant materials cloud archive.
- Dimer (rotation separated) runs
- From this archive
A reproducible setup for generating benchmarks discussed elsewhere [1 (Github), 2 (Github)].
docs/contains documentation.pixitasks encapsulatedevtasks.
Each of the main tools are mirrored with git-subrepo to ensure
reproducibility.
The raw data stores are located on the University of Iceland OneDrive instance and handled via a WebDAV interface to the store.
rclone serve webdav HIOneDrive:/.dvcstore --vfs-cache-mode full --addr localhost:9677[1] R. Goswami, M. Masterov, S. Kamath, A. Pena-Torres, and H. Jónsson, “Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions,” J. Chem. Theory Comput., Jul. 2025, doi: 10.1021/acs.jctc.5c00866.
[2] R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” AIP Adv., vol. 15, no. 8, p. 85210, Aug. 2025, doi: 10.1063/5.0283639.
[3] E. D. Hermes, K. Sargsyan, H. N. Najm, and J. Zádor, “Sella, an Open-Source Automation-Friendly Molecular Saddle Point Optimizer,” J. Chem. Theory Comput., vol. 18, no. 11, pp. 6974–6988, Nov. 2022, doi: 10.1021/acs.jctc.2c00395.
MIT. Sub-packages have their own licenses.