-
Notifications
You must be signed in to change notification settings - Fork 19
Description
We're using MS2PIP through MS2PIPFeatureGenerator (MS2Rescore). In our application (Mascot Server), we run MS2PIPFeatureGenerator.add_features(), followed by DeepLCFeatureGenerator.add_features(). The relevant parts of the environment:
- Python 3.11.9
- ms2pip 4.0.0
- deeplc 2.2.38
- ms2rescore 3.1.1
- xgboost 1.7.6
On some Windows systems, using the HCD2021 model can crash the process with exit code 3221226505. We've seen two customer systems and three internal test systems with this issue, but we haven't found any common factor between systems that fail and systems that succeed. It is not explained by: Windows version, processor make or model, number of cores, amount of RAM, whether a GPU is present, VS2015-2022 redistributable version or even the data set used as input.
However, it is correlated strongly with HCD2021. When using HCD2021 on these systems, there are random failures at or after this line:
DEBUG Converting feature vectors to XGBoost DMatrix...
The crash can also happen later. We have seen it after these lines:
DEBUG Calculating features from predicted spectra
In this case, you might also see:
> Traceback (most recent call last):
> File "multiprocessing\pool.py", line 131, in worker
> File "multiprocessing\queues.py", line 374, in put
> File "multiprocessing\connection.py", line 200, in send_bytes
> File "multiprocessing\connection.py", line 301, in _send_bytes
> BrokenPipeError: [WinError 109] The pipe has been ended
-
A crash after MS2PIPFeatureGenerator.add_features() has finished, for example during DeepLCFeatureGenerator.add_features().
-
A crash at the very end of python.exe during the Python interpreter cleanup.
When using HCD2019, we're not seeing any crashes. When using Immuno-HCD, there are occasionally crashes but at a fairly low rate.
Turning off multiprocessing makes no difference. Even when num_processes=1, the crashes happen when HCD2021 is selected (but to be specific, only on the few Windows systems that are affected by this).
We also tried running the Linux version inside a Linux VM on a failing Windows system. Works fine. We also tried running a Windows version inside a Windows VM on a failing Windows system: crashes as described above. So, the bug is specific to Windows, but it may have some connection to the processor type or hardware.
It feels like a classic buffer overrun bug in some native code portion, maybe in XGBoost. It could be a data race, although turning off multiprocessing suggests it's not. As an experiment, I tried updating the Python environment to Python 3.12 with newer MS2PIP and XGBoost, but the crash persisted (on the affected Windows systems).
Is the issue with the HCD2021 .xgboost weights files? One obvious difference is the size of the weights files.
fails: model_20210416_HCD2021_Y.xgboost + model_20210416_HCD2021_B.xgboost = 910MB
fails (sometimes): model_20210316_Immuno_HCD_Y.xgboost + model_20210316_Immuno_HCD_B.xgboost = 926MB
succeeds: model_20190107_HCD_train_Y.xgboost + model_20190107_HCD_train_B.xgboost = 17MB
I don't know how to reproduce this fault. For example, it has never failed on my development system, so it's been very difficult to debug. I'll add more information here if it becomes available.