-
Notifications
You must be signed in to change notification settings - Fork 61
Support dynamic communication of atom buffers #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this PR has also changed the MTS. Is that a rebasing issue? Sorry, I just noticed that the atom list updating frequency is set to the same as timeStepFactor.
I haven't tested the PR locally and couldn't figure out why the NAMD tests hanged.
| } | ||
|
|
||
| /// Index of this atom in the internal arrays | ||
| inline int array_index() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that I need to synchronize the changes to #788.
| if (((cvm::step_absolute()+1) % atom_list_freq) == 0) { | ||
| for (cvm::atom_iter ai = dyn_atoms->begin(); | ||
| ai != dyn_atoms->end(); ai++) { | ||
| proxy->increase_refcount(ai->array_index()); | ||
| } | ||
| } | ||
|
|
||
| if (!is_enabled(f_cvc_dynamic_atom_list)) { | ||
| // If the CVC is not enabling/disabling atoms on its own, then disable | ||
| // them all for the next step | ||
| if (((cvm::step_absolute()) % atom_list_freq) == 0) { | ||
| for (cvm::atom_iter ai = dyn_atoms->begin(); | ||
| ai != dyn_atoms->end(); ai++) { | ||
| proxy->decrease_refcount(ai->array_index()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These may conflict with #788.
| { | ||
| if (atoms_refcount[index] > 0) { | ||
| atoms_refcount[index] -= 1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why there is no way to decrease the reference count when working on #788...
| virtual int compute_volmap(int flags, | ||
| int volmap_id, | ||
| cvm::atom_iter atom_begin, | ||
| cvm::atom_iter atom_end, | ||
| cvm::real *value, | ||
| cvm::real *atom_field); | ||
| cvm::real *atom_field, | ||
| int *inside); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to update the signature change in #788 as well .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may not have to if we agree to unify the code path in that PR.
| /// If needed, update the requested list of atoms from this group | ||
| virtual int update_requested_atoms(cvm::atom_group *dyn_atoms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to implement this for cvm::atom_group_soa in #788.
|
|
||
| outputAppliedForce on | ||
| timeStepFactor 0 | ||
| timeStepFactor 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the MTS test changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 should not be a legal value, so that needed to be fixed. Other than that the physical trajectory should not change as long as the value divides 4 (the interval of the bias), so I picked 4 exactly
| update_requested_atoms(); | ||
| } | ||
| if ((cvm::step_relative() % atom_list_frequency()) == 0) { | ||
| // After all-atom computation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may need to implement this in #783 as well. Why is update_requested_atoms() called twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those two requests happen at consecutive steps, first to recall all atoms that have positive refcount, then at the next step to prune those that don't.
| if (atoms_refcount[i] > 0) { | ||
| // Add a force only if the atom is currently used | ||
| cvm::rvector const &f = atoms_new_colvar_forces[i]; | ||
| modifyForcedAtoms().add(atoms_ids[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may work well in the GlobalMaster interface. However, the CudaGlobalMaster interface regards atoms_ids as the atoms requested. What should I do to #783?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm commenting here only on this issue, since your other comments seem to connect to it.
The crux of the problem appears to be that classic GlobalMaster has its own list of currently requested atoms, which is dynamic and distinct from colvarproxy::atoms_ids, which is a static list. Instead, CUDAGlobalMaster uses directly atoms_ids.
I guess this could be addressed by hosting both lists (static and dynamic) within the proxy object, so that the different implementations can just use the one that they support.
662dc98 to
5a6650c
Compare
|
@giacomofiorin I tried to rebase the code against the latest master branch. The test |
The refactoring of the NAMD proxy backend leads to failing the 007_map_total_internal regtest. This is due once again to the combination of compiler optimizations and GridForceGrid being based on float instead of double. Updating the test's reference files in a separate commit.
5a6650c to
5731262
Compare
If I recall correctly it still failed originally with an older NAMD version. I'm considering using a shorter timestep to make it more robust |
|
I ran the test on my laptop with AMD Ryzen 5800H and NVIDIA RTX 3060 using the GPU-resident mode, and the speed is not that slow.
I am curious why it was 9.1 ns/day in your test. My Colvars test file is: where waters is an atom selection that has 21,458 atoms. |
|
@HanatoK That's good to see! (Especially the CudaGM result) The hardware config was completely different, so I don't know for sure why the big difference. Some factors are probably your CPU (newer and higher clock) and the fact that in GPU-resident mode the CPU is mostly dedicated to GlobalMaster+Colvars. IMO, adopting CudaGM as the default interface for GPU-resident NAMD is of higher priority than finishing this PR. I was mostly concerned with mitigating the slowdown in message-passing communication, but it's not a big deal if we end up not supporting it when it's not needed. |
This PR revives an old branch containing work began a few years ago but not finished since. At the time, NAMD GPU-resident and the Colvars-GROMACS interface were both under active development. As of today, neither interface supports the features from this branch (they currently work on the NAMD 2.x CPU code path).
The changes in this PR support dynamically requesting and unrequesting atomic coordinates from the simulation engine. This allows skipping communication associated with atoms not being used at a given step, because of multiple time-stepping (MTS) or pairlists/neighbor lists.
This is implemented by reference counting atoms in the proxy, as well as setting frequency parameters:
atomListFrequencyis added to define when the full computation is done.pairListFrequencyandtimeStepFactorare used.Below is the peak performance for ApoA1 with a restraint on the coordination number between one side chain (4 atoms) and the ~21,000 water oxygens of the system and a pairlist frequency of 10000 steps. This benchmark was taken on old hardware (InfiniBand cluster with Xeon E5-2650, 32 nodes), where a multinode job was giving comparable performance to a modern GPU.
Opening as draft because the implementation appears to be out of date with respect to recent changes in NAMD (CI tests are currently hanging).