Support dynamic communication of atom buffers #789

giacomofiorin · 2025-04-15T21:21:28Z

This PR revives an old branch containing work began a few years ago but not finished since. At the time, NAMD GPU-resident and the Colvars-GROMACS interface were both under active development. As of today, neither interface supports the features from this branch (they currently work on the NAMD 2.x CPU code path).

The changes in this PR support dynamically requesting and unrequesting atomic coordinates from the simulation engine. This allows skipping communication associated with atoms not being used at a given step, because of multiple time-stepping (MTS) or pairlists/neighbor lists.

This is implemented by reference counting atoms in the proxy, as well as setting frequency parameters:

For volmap variables defined in an internal frame, a keyword atomListFrequency is added to define when the full computation is done.
For coordination number or MTS, the existing keywords pairListFrequency and timeStepFactor are used.
The above keyboards are combined into a single proxy-side parameter that determines the frequency at which all atoms are re-requested again; for NAMD/GlobalMaster, atoms are requested at the step immediately prior and disabled right after.

Below is the peak performance for ApoA1 with a restraint on the coordination number between one side chain (4 atoms) and the ~21,000 water oxygens of the system and a pairlist frequency of 10000 steps. This benchmark was taken on old hardware (InfiniBand cluster with Xeon E5-2650, 32 nodes), where a multinode job was giving comparable performance to a modern GPU.

Condition	(ns/day)
unbiased	84.0
coordNum	9.1
coordNum+pairList (master)	12.5
coordNum+pairList (this branch)	79.6

Opening as draft because the implementation appears to be out of date with respect to recent changes in NAMD (CI tests are currently hanging).

HanatoK

~~It looks like this PR has also changed the MTS. Is that a rebasing issue?~~ Sorry, I just noticed that the atom list updating frequency is set to the same as timeStepFactor.
I haven't tested the PR locally and couldn't figure out why the NAMD tests hanged.

HanatoK · 2025-04-18T20:28:26Z

src/colvaratoms.h

  }

+  /// Index of this atom in the internal arrays
+  inline int array_index() const


It looks like that I need to synchronize the changes to #788.

HanatoK · 2025-04-18T20:29:39Z

src/colvarcomp.cpp

+    if (((cvm::step_absolute()+1) % atom_list_freq) == 0) {
+      for (cvm::atom_iter ai = dyn_atoms->begin();
+           ai != dyn_atoms->end(); ai++) {
+        proxy->increase_refcount(ai->array_index());
+      }
+    }
+
+    if (!is_enabled(f_cvc_dynamic_atom_list)) {
+      // If the CVC is not enabling/disabling atoms on its own, then disable
+      // them all for the next step
+      if (((cvm::step_absolute()) % atom_list_freq) == 0) {
+        for (cvm::atom_iter ai = dyn_atoms->begin();
+             ai != dyn_atoms->end(); ai++) {
+          proxy->decrease_refcount(ai->array_index());


These may conflict with #788.

HanatoK · 2025-04-18T20:30:59Z

src/colvarproxy.h

+  {
+    if (atoms_refcount[index] > 0) {
+      atoms_refcount[index] -= 1;
+    }


I was wondering why there is no way to decrease the reference count when working on #788...

HanatoK · 2025-04-18T20:32:26Z

src/colvarproxy_volmaps.h

  virtual int compute_volmap(int flags,
                             int volmap_id,
                             cvm::atom_iter atom_begin,
                             cvm::atom_iter atom_end,
                             cvm::real *value,
-                             cvm::real *atom_field);
+                             cvm::real *atom_field,
+                             int *inside);


I will need to update the signature change in #788 as well .

You may not have to if we agree to unify the code path in that PR.

HanatoK · 2025-04-18T20:33:42Z

src/colvarcomp.h

+  /// If needed, update the requested list of atoms from this group
+  virtual int update_requested_atoms(cvm::atom_group *dyn_atoms);


I will need to implement this for cvm::atom_group_soa in #788.

HanatoK · 2025-04-18T20:44:53Z

namd/tests/library/016_mts_bias/test.in


    outputAppliedForce on
-    timeStepFactor 0
+    timeStepFactor 4


Why is the MTS test changed?

0 should not be a legal value, so that needed to be fixed. Other than that the physical trajectory should not change as long as the value divides 4 (the interval of the bias), so I picked 4 exactly

HanatoK · 2025-04-18T20:46:38Z

namd/src/colvarproxy_namd.C

+      update_requested_atoms();
+    }
+    if ((cvm::step_relative() % atom_list_frequency()) == 0) {
+      // After all-atom computation


I may need to implement this in #783 as well. Why is update_requested_atoms() called twice?

Those two requests happen at consecutive steps, first to recall all atoms that have positive refcount, then at the next step to prune those that don't.

HanatoK · 2025-04-18T20:49:18Z

namd/src/colvarproxy_namd.C

+    if (atoms_refcount[i] > 0) {
+      // Add a force only if the atom is currently used
+      cvm::rvector const &f = atoms_new_colvar_forces[i];
+      modifyForcedAtoms().add(atoms_ids[i]);


It may work well in the GlobalMaster interface. However, the CudaGlobalMaster interface regards atoms_ids as the atoms requested. What should I do to #783?

I'm commenting here only on this issue, since your other comments seem to connect to it.

The crux of the problem appears to be that classic GlobalMaster has its own list of currently requested atoms, which is dynamic and distinct from colvarproxy::atoms_ids, which is a static list. Instead, CUDAGlobalMaster uses directly atoms_ids.

I guess this could be addressed by hosting both lists (static and dynamic) within the proxy object, so that the different implementations can just use the one that they support.

HanatoK · 2025-04-22T15:35:41Z

@giacomofiorin I tried to rebase the code against the latest master branch. The test 000_rmsd-mts_harmonic-fixed is still expected to fail.

The refactoring of the NAMD proxy backend leads to failing the 007_map_total_internal regtest. This is due once again to the combination of compiler optimizations and GridForceGrid being based on float instead of double. Updating the test's reference files in a separate commit.

…list combo)

giacomofiorin · 2025-04-30T22:16:51Z

@giacomofiorin I tried to rebase the code against the latest master branch. The test 000_rmsd-mts_harmonic-fixed is still expected to fail.

If I recall correctly it still failed originally with an older NAMD version. I'm considering using a shorter timestep to make it more robust

HanatoK · 2025-05-13T20:24:08Z

I ran the test on my laptop with AMD Ryzen 5800H and NVIDIA RTX 3060 using the GPU-resident mode, and the speed is not that slow.

Condition	(ns/day)
Unbiased	42.43
coordNum (GlobalMaster)	32.0685
coordNum (CudaGM+SOA)	40.19

I am curious why it was 9.1 ns/day in your test. My Colvars test file is:

indexFile index.ndx

colvar {
  name cv
  coordNum {
    group1 {
      atomNumbers {5 17 31 55}
    }
    group2 {
      indexGroup waters
    }
    tolerance 0.001
    pairListFrequency 10
    cutoff 10.0
  }
}

harmonic {
  colvars cv
  centers 475.0
  forceConstant 0.001
}

where waters is an atom selection that has 21,458 atoms.

giacomofiorin · 2025-05-13T22:35:14Z

@HanatoK That's good to see! (Especially the CudaGM result)

The hardware config was completely different, so I don't know for sure why the big difference. Some factors are probably your CPU (newer and higher clock) and the fact that in GPU-resident mode the CPU is mostly dedicated to GlobalMaster+Colvars.

IMO, adopting CudaGM as the default interface for GPU-resident NAMD is of higher priority than finishing this PR. I was mostly concerned with mitigating the slowdown in message-passing communication, but it's not a big deal if we end up not supporting it when it's not needed.

giacomofiorin requested a review from HanatoK April 15, 2025 21:22

HanatoK requested changes Apr 18, 2025

View reviewed changes

HanatoK force-pushed the atomlist-frequency branch from 662dc98 to 5a6650c Compare April 22, 2025 15:34

giacomofiorin added 15 commits April 30, 2025 18:10

Add atomListFrequency keyword to mapTotal

d7c18fb

Allow clear_atom() to return error code

307c188

Don't send forces to NAMD for atoms with zero refcount

8cbbdee

Unrequest atom from NAMD when not needed any more

62e1c97

Add support for reupdating the atom list in the proxy

0a19c10

Define atom_list_freq for all CVCs, let them coordinate with the proxy

45da9ed

Use atom_list_freq to support timeStepFactor for CVCs

96e18ab

Fix value of timeStepFactor in regtest to reflect doc

b3df211

Add test for RMSD with timeStepFactor (explicit atoms + dynamic atom …

aee991e

…list combo)

Silence unused-variable warnings

ec3f708

Use step number when reporting NAMD properties in log

d221a02

Add support for reupdating the atom list in the proxy

93e29ea

Small fix

2f43d64

Fix merge error

5731262

giacomofiorin force-pushed the atomlist-frequency branch from 5a6650c to 5731262 Compare April 30, 2025 22:10

		/// If needed, update the requested list of atoms from this group
		virtual int update_requested_atoms(cvm::atom_group *dyn_atoms);

Support dynamic communication of atom buffers #789

Are you sure you want to change the base?

Support dynamic communication of atom buffers #789

Uh oh!

Conversation

giacomofiorin commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HanatoK left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giacomofiorin Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HanatoK commented Apr 22, 2025

Uh oh!

giacomofiorin commented Apr 30, 2025

Uh oh!

HanatoK commented May 13, 2025

Uh oh!

giacomofiorin commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

giacomofiorin commented Apr 15, 2025 •

edited

Loading

HanatoK left a comment •

edited

Loading

giacomofiorin Apr 30, 2025 •

edited

Loading

giacomofiorin commented May 13, 2025 •

edited

Loading