Skip to content

Conversation

@niekdejonge
Copy link
Collaborator

@niekdejonge niekdejonge commented Aug 21, 2025

Many of the changes are pure refactoring, which in the end made it very easy to add the balancing across ionmodes. These refactoring steps are:

  • Split the different datagenerators to different files, before they were all in data_generators.py
  • renamed SpectrumPairGenerator -> TrainingBatchGenerator, this better captures what the class does.
  • Moved the data augmentation to a separate file out of the TrainingBatchGenerator.
  • Refactored the data augmentation to make it a bit more modular and testable.
  • Moved the Spectrum picking from TraininBatchGenerator into InchikeyPairGenerator and renamed InchikeyPairGenerator to SpectrumPairGenerator.
  • Turned the new SpectrumPairGenerator (InchikeyPairGenerator before) into a real generator, before we had a method returning a generator.

After these changes (which didn't add any new functionality), I added the cross-ionization mode capabilities. These can all be found in the inchikey_pair_selection_cross_ionmode.py. This works as follows: 3 SpectrumPairGenerators are made. 1 for pos-pos, 1 for pos-neg and 1 for neg-neg. For the pos-pos and neg-neg this is the same as before, for the across ionmode I had to reimplement the inchikey pair generator and SpectrumPairGenerator. (all in inchikey_pair_selection_cross_ionmode.py). These 3 Generators are combined into a single Generator in CombinedSpectrumGenerator, which just loops over the 3 generators one by one. This CombinedSpectrumGenerator is passed to TrainingBatchGenerator, so TrainingBatchGenerator did not have to be adapted for the new cross ion mode capability.

Note: The new version is fully backwards compatible; The old models work with this version and the new models work on version 2.5.4. So save to merge in that sense.

@niekdejonge niekdejonge changed the base branch from main to fix_data_split August 27, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants