Skip to content

Conversation

@giacomofiorin
Copy link
Member

@giacomofiorin giacomofiorin commented Apr 15, 2025

This PR revives an old branch containing work began a few years ago but not finished since. At the time, NAMD GPU-resident and the Colvars-GROMACS interface were both under active development. As of today, neither interface supports the features from this branch (they currently work on the NAMD 2.x CPU code path).

The changes in this PR support dynamically requesting and unrequesting atomic coordinates from the simulation engine. This allows skipping communication associated with atoms not being used at a given step, because of multiple time-stepping (MTS) or pairlists/neighbor lists.

This is implemented by reference counting atoms in the proxy, as well as setting frequency parameters:

  • For volmap variables defined in an internal frame, a keyword atomListFrequency is added to define when the full computation is done.
  • For coordination number or MTS, the existing keywords pairListFrequency and timeStepFactor are used.
  • The above keyboards are combined into a single proxy-side parameter that determines the frequency at which all atoms are re-requested again; for NAMD/GlobalMaster, atoms are requested at the step immediately prior and disabled right after.

Below is the peak performance for ApoA1 with a restraint on the coordination number between one side chain (4 atoms) and the ~21,000 water oxygens of the system and a pairlist frequency of 10000 steps. This benchmark was taken on old hardware (InfiniBand cluster with Xeon E5-2650, 32 nodes), where a multinode job was giving comparable performance to a modern GPU.

Condition (ns/day)
unbiased 84.0
coordNum 9.1
coordNum+pairList (master) 12.5
coordNum+pairList (this branch) 79.6

Opening as draft because the implementation appears to be out of date with respect to recent changes in NAMD (CI tests are currently hanging).

@giacomofiorin giacomofiorin requested a review from HanatoK April 15, 2025 21:22
Copy link
Member

@HanatoK HanatoK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR has also changed the MTS. Is that a rebasing issue? Sorry, I just noticed that the atom list updating frequency is set to the same as timeStepFactor.
I haven't tested the PR locally and couldn't figure out why the NAMD tests hanged.

}

/// Index of this atom in the internal arrays
inline int array_index() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that I need to synchronize the changes to #788.

Comment on lines +169 to +182
if (((cvm::step_absolute()+1) % atom_list_freq) == 0) {
for (cvm::atom_iter ai = dyn_atoms->begin();
ai != dyn_atoms->end(); ai++) {
proxy->increase_refcount(ai->array_index());
}
}

if (!is_enabled(f_cvc_dynamic_atom_list)) {
// If the CVC is not enabling/disabling atoms on its own, then disable
// them all for the next step
if (((cvm::step_absolute()) % atom_list_freq) == 0) {
for (cvm::atom_iter ai = dyn_atoms->begin();
ai != dyn_atoms->end(); ai++) {
proxy->decrease_refcount(ai->array_index());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These may conflict with #788.

{
if (atoms_refcount[index] > 0) {
atoms_refcount[index] -= 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why there is no way to decrease the reference count when working on #788...

Comment on lines 85 to +91
virtual int compute_volmap(int flags,
int volmap_id,
cvm::atom_iter atom_begin,
cvm::atom_iter atom_end,
cvm::real *value,
cvm::real *atom_field);
cvm::real *atom_field,
int *inside);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to update the signature change in #788 as well .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not have to if we agree to unify the code path in that PR.

Comment on lines +243 to +250
/// If needed, update the requested list of atoms from this group
virtual int update_requested_atoms(cvm::atom_group *dyn_atoms);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to implement this for cvm::atom_group_soa in #788.


outputAppliedForce on
timeStepFactor 0
timeStepFactor 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the MTS test changed?

Copy link
Member Author

@giacomofiorin giacomofiorin Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 should not be a legal value, so that needed to be fixed. Other than that the physical trajectory should not change as long as the value divides 4 (the interval of the bias), so I picked 4 exactly

update_requested_atoms();
}
if ((cvm::step_relative() % atom_list_frequency()) == 0) {
// After all-atom computation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may need to implement this in #783 as well. Why is update_requested_atoms() called twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two requests happen at consecutive steps, first to recall all atoms that have positive refcount, then at the next step to prune those that don't.

if (atoms_refcount[i] > 0) {
// Add a force only if the atom is currently used
cvm::rvector const &f = atoms_new_colvar_forces[i];
modifyForcedAtoms().add(atoms_ids[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may work well in the GlobalMaster interface. However, the CudaGlobalMaster interface regards atoms_ids as the atoms requested. What should I do to #783?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm commenting here only on this issue, since your other comments seem to connect to it.

The crux of the problem appears to be that classic GlobalMaster has its own list of currently requested atoms, which is dynamic and distinct from colvarproxy::atoms_ids, which is a static list. Instead, CUDAGlobalMaster uses directly atoms_ids.

I guess this could be addressed by hosting both lists (static and dynamic) within the proxy object, so that the different implementations can just use the one that they support.

@HanatoK HanatoK force-pushed the atomlist-frequency branch from 662dc98 to 5a6650c Compare April 22, 2025 15:34
@HanatoK
Copy link
Member

HanatoK commented Apr 22, 2025

@giacomofiorin I tried to rebase the code against the latest master branch. The test 000_rmsd-mts_harmonic-fixed is still expected to fail.

@giacomofiorin
Copy link
Member Author

@giacomofiorin I tried to rebase the code against the latest master branch. The test 000_rmsd-mts_harmonic-fixed is still expected to fail.

If I recall correctly it still failed originally with an older NAMD version. I'm considering using a shorter timestep to make it more robust

@HanatoK
Copy link
Member

HanatoK commented May 13, 2025

I ran the test on my laptop with AMD Ryzen 5800H and NVIDIA RTX 3060 using the GPU-resident mode, and the speed is not that slow.

Condition (ns/day)
Unbiased 42.43
coordNum (GlobalMaster) 32.0685
coordNum (CudaGM+SOA) 40.19

I am curious why it was 9.1 ns/day in your test. My Colvars test file is:

indexFile index.ndx

colvar {
  name cv
  coordNum {
    group1 {
      atomNumbers {5 17 31 55}
    }
    group2 {
      indexGroup waters
    }
    tolerance 0.001
    pairListFrequency 10
    cutoff 10.0
  }
}

harmonic {
  colvars cv
  centers 475.0
  forceConstant 0.001
}

where waters is an atom selection that has 21,458 atoms.

@giacomofiorin
Copy link
Member Author

giacomofiorin commented May 13, 2025

@HanatoK That's good to see! (Especially the CudaGM result)

The hardware config was completely different, so I don't know for sure why the big difference. Some factors are probably your CPU (newer and higher clock) and the fact that in GPU-resident mode the CPU is mostly dedicated to GlobalMaster+Colvars.

IMO, adopting CudaGM as the default interface for GPU-resident NAMD is of higher priority than finishing this PR. I was mostly concerned with mitigating the slowdown in message-passing communication, but it's not a big deal if we end up not supporting it when it's not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants