Add multithreading support to ddsim#1240
Conversation
Test Results 9 files 9 suites 4h 53m 35s ⏱️ For more details on these failures, see this check. Results for commit b078023. ♻️ This comment has been updated with latest results. |
|
This is still a work in progress, but let me give some more background here as to the plan (the "more later" from the top).
It is likely that I will be doing some rewrites of the last commit (5bf711c) which is a bit too big for my taste and not as clear as one would like for non-squash merging. |
|
The status of this PR after #1417 is that we can now run MT gun simulations, like so: Main missing feature is ensuring |
| self._g4gun = None | ||
| self._g4gps = None |
There was a problem hiding this comment.
TODO: Geant4 gun and GPS event generators are not initialized correctly due to this removal.
There was a problem hiding this comment.
Despite modified implementation, this remain a TODO since only one thread reads the macroFile... (as of da36a3e). And Geant4GeneratorWrapper doesn't allow setting properties when shared = True.
|
The primary remaining difficulty I'm having here is with the event input files (I'm using HepMC3), which we create in: DD4hep/DDG4/python/DDSim/DD4hepSimulation.py Lines 444 to 446 in 6143d12 Simply making the GeneratorAction shared with To get around this, I went through setting up the necessary So, I'm open to input as to how best to have a single @MarkusFrankATcernch @andresailer Do you have any suggestions? |
|
Hi Wouter, This construct is not simple. Let's see what is necessary before the hacking starts... For Now what one probably wants is somehow a structure that this worker thread bound From what I see somehow the input stage needs then to be re-designed. In particular exchanging data between the actions building the queue of Having said all this, it looks necessary that the queue of
How to do something like this in a backwards compatible way is not so obvious to me and should be designed properly. |
|
So after coming back from the drawing board.... to an input stage which looks much rather like this (details are left out here): Here you have
In principle this is not too difficult, but still quite more difficult that you have imagined upfront. In addition:
|
|
While I agree that the diagram you outline (with an event queue) is optimal for throughput performance, I wonder if there is another less optimal but still functional approach (which may have some threads waiting unnecessarily for some of the time). What I had originally anticipated was a pattern where there are multiple Geant4InputActions sequences, one per thread as now, but the action that reads from disk would have a shared reader that is locked. This would mimic the approach of the Geant4OutputAction where each thread has an output event action, but they share and lock the output file. |
|
@wdconinc Where one actually puts the Geant4EventQueue is sort of user-gusto here, but it must be located before the If we invest here, we should do it properly....even if it costs some development. |
See #1440, which exposes the properties correctly now. |
|
@wdconinc @MarkusFrankATcernch Hi Wouter and Markus, what is the status of including multithreading into ddsim with HEPMC3 file inputs? I am working on a large simulation, and having MT could potentially give a nice boost in throughput. If there are concrete places that I can contribute, please let me know. If the exposure of properties is handled, is this "simply" a matter of having ddsim handle file reads and writes correctly? Or are there more fundamental components to update? Again, keen to help where it makes sense! |
|
@murnanedaniel Depending on your use case, this already works fine (multi-threaded).
I have been using this for a month or two (hepmc3 input files) typically 8 threads and it scales without loss of efficiency at that (small) thread count. |
|
@wdconinc @murnanedaniel |
The scope of this PR has always been to make multi-threaded running work with ddsim. I don't know what other ways of running DDG4 people are using, but I'm not familiar with them. |
|
Hi @wdconinc. Thanks for the quick response. I have been trying to reproduce your working setup with input hepmc3 files, but get this error, when setting to 8 threads: I removed the elif python handler test that was causing the recursion, leading to (I believe) the real error, which is this: For context, here is the full output log: full_output.log |
|
Okay, I made a couple of quick tweaks and it does seem to work, which is really promising. Firstly, the tweaks I made are here: murnanedaniel@cbbe86e. I'm not sure if you agree with them, or why they are not necessary in your setup? Secondly, there is one final error appearing after the simulation that doesn't appear in the main DD4hep. You can see it at the end of the successful log file: full_output_mt.log |
|
@wdconinc Just following up here, since we will soon be running a large set of simulations with your MT changes. I just want to ensure that the small error that appears, and the tweaks that I needed to make, are still consistent with the working version of ddsim as you see it. On that same point - have you validated that MT sim of hepmc3 inputs produces more or less identical outputs to DD4hep main? |
|
Is there any plan to have this integrated? |
|
@BrieucF Whenever Wouter gives hist OK. |
|
I added what I have locally, which consists of mainly a number of new added tests that aim to compare single- and multi-threaded output. This requires a comparison script to allow for out-of-order event output (but still identical when comparing the event using an index map). There are some persistent differences between 2 and 4 threaded running. Those are still under investigation. |
Add basic MT functionality tests with 1, 2, and 4 threads, file-based generator tests (HepMC3, EDM4hep), and a comparison script framework for validating ST vs MT equivalence. Tests verify: - MT mode runs without crashes - Different thread counts (1, 2, 4) work correctly - File-based input generators work in MT mode - Backward compatibility with -j 1 (single-threaded) Fix double-save bug in EDM4hep/LCIO/ROOT output for ST mode In single-threaded mode, events were saved twice because setupEDM4hepOutput/setupLCIOOutput/setupROOTOutput hardcoded shared=True. Fixed by making the shared flag conditional on NumberOfThreads > 1. Fix SIGSEGV crash in MT mode: make EventSeeder shared setupEventSeeder() was called once per worker thread, creating multiple EventSeeder instances with shared=False. During cleanup this caused conflicts/double-free leading to SIGSEGV. Fixed by creating EventSeeder with shared=True (one instance shared across all workers) and guarding against duplicate creation. Add tests for G4Gun and GPS with macroFile These tests document that G4Gun and GPS with macroFile work in ST mode but not in MT mode (macros execute during global init before worker threads exist). Generator setup is guarded with numberOfThreads == 1. fix: additional DDTest changes
Fixes heap corruption and SIGSEGV crashes when using ROOT output in multi-threaded mode. Root cause: Multiple Geant4 worker threads were accessing ROOT I/O objects concurrently. ROOT's I/O system is not thread-safe by default, causing heap corruption during multi-threaded writes that manifested during exit in TFile::WriteStreamerInfo / TROOT::CloseFiles. Changes: 1. Call ROOT.EnableThreadSafety() before any ROOT objects are created when numberOfThreads > 1 (MT mode). 2. Add static std::mutex s_rootMutex to Geant4Output2ROOT and protect all ROOT I/O operations with std::lock_guard: - commit(): TTree::Fill() and branch operations - closeOutput(): file Write() and Close() - beginRun(): file creation and opening - fill(): branch Fill() operations The mutex ensures full serialization of ROOT I/O across all worker threads, preventing concurrent access to TFile/TTree/TBranch objects even with ROOT::EnableThreadSafety() in place. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
With the permission of Wouter I have changed a few things to fix tests. All of them are passing now (MT generation and MT comparison, for example, between 2 and 4 threads) except comparing exactly the output of ST and MT. First, there are some tests that don't get to run because batch mode is not passed and this triggers an error. Fixed in Then there a few crashes on CI, and the stack trace is pointing to Then, the comparison script needed a few setup changes, to load The last change is to make Next:
|


This PR adds multithreading support to ddsim, through the
-jflag.Backwards-incompatible effects (as the corner cases that they are):
__setupWorker, this means they may not be available in the same context as global actions. This means that during user physics list initialization one cannot query particle handlers which have not been created yet.BEGINRELEASENOTES
ENDRELEASENOTES