Skip to content

TheochemUI/otgpd_repro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

Contains the reproduction details for the publication on the wall-time efficient and robust optimal transport Gaussian Process dimer.

Third in a series of papers [1, 2] on the Dimer method with a focus on the failure modes of the Gaussian Process Regression model on the 500 system benchmark of Hermes et. al. [3].

Reference

If you use this repository or its parts please cite the corresponding publication or data source.

Primary Publication

Submitted to ChemPhysChem.

Preprint

A more accessible form of the same publication is:

Data source

Submitted to the Materials Cloud Archive.

Replication data

Remember to inflate the data using the materialscloud source (section ref:mca) before using the scripts in the repository. Assuming that the .xz files have been downloaded to data relative to the repository root:

export GITROOT=$(git rev-parse --show-toplevel)
cd $GITROOT/data
tar -xf otgpd_alldat.tar.xz && rm -rf otgpd_alldat.tar.xz
# Raw benchmark data, i.e., EON output logs
cp $GITROOT/data/hpc.tar.xz $GITROOT/bench_runs/runs/hpc
cd $GITROOT/bench_runs/runs/hpc
tar -xf hpc.tar.xz && rm -rf hpc.tar.xz

Structure

The repository has code archives, benchmark runs, and scripts for analysis.

❯ tree -L 2
.
├── CODEOWNERS
├── docs
│   ├── 00_freeform.org
│   ├── 01_hpc.org
│   ├── 03_suppl_viz.org
│   ├── 04_models.org
│   └── 05_suppl_py.org
├── LICENSE
├── pixi.lock
├── pixi.toml
├── readme.org
├── runs
│   ├── automated
│   ├── calc_rundata.py
│   ├── init_condcheck.py
│   └── run_pf.py
├── scripts
│   ├── build_nwchem.sh
│   └── env_setup.sh
└── subrepos
    ├── chemparseplot
    ├── eOn
    ├── gpr_optim
    ├── IterativeRotationsAssignments
    ├── readme.org
    └── rgpycrumbs

Where the data in the archives expands to locations within the benchmarks.

Each of the benchmarks consists of the following structure:

.
├── doublet
│   ├── 000
# .....
│   └── 234
└── singlet
│   ├── 000
# .....
    └── 264

Comprising of 500 systems.

For comparisons:

GPDimer runs
Extract from the relevant materials cloud archive.
Dimer (rotation separated) runs
From this archive

Usage

A reproducible setup for generating benchmarks discussed elsewhere [1 (Github), 2 (Github)].

Setup

  • docs/ contains documentation.
  • pixi tasks encapsulate dev tasks.

Design

Sub-repositories

Each of the main tools are mirrored with git-subrepo to ensure reproducibility.

DVC

The raw data stores are located on the University of Iceland OneDrive instance and handled via a WebDAV interface to the store.

rclone serve webdav HIOneDrive:/.dvcstore --vfs-cache-mode full --addr localhost:9677

References

[1] R. Goswami, M. Masterov, S. Kamath, A. Pena-Torres, and H. Jónsson, “Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions,” J. Chem. Theory Comput., Jul. 2025, doi: 10.1021/acs.jctc.5c00866.

[2] R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” AIP Adv., vol. 15, no. 8, p. 85210, Aug. 2025, doi: 10.1063/5.0283639.

[3] E. D. Hermes, K. Sargsyan, H. N. Najm, and J. Zádor, “Sella, an Open-Source Automation-Friendly Molecular Saddle Point Optimizer,” J. Chem. Theory Comput., vol. 18, no. 11, pp. 6974–6988, Nov. 2022, doi: 10.1021/acs.jctc.2c00395.

License

MIT. Sub-packages have their own licenses.

About

OTGPD reproducer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published