-
Notifications
You must be signed in to change notification settings - Fork 51
eliminate internal psm filter for protein inference #2603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eliminate internal psm filter for protein inference #2603
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2603 +/- ##
==========================================
+ Coverage 94.70% 94.72% +0.02%
==========================================
Files 191 191
Lines 19956 20008 +52
Branches 3701 3701
==========================================
+ Hits 18899 18953 +54
Misses 579 579
+ Partials 478 476 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request eliminates the internal PSM filter (qValue < 0.01) previously used in protein inference, allowing the protein analysis to respect externally configured filters. This enables protein inference to work with both Q-value and PEP Q-value filtering consistently.
Changes:
- Removed hardcoded 0.01 Q-value filter from ProteinParsimonyEngine, relying instead on pre-filtered PSM lists
- Added FilterType and FilterThreshold parameters to ProteinScoringAndFdrEngine to support PEP Q-value filtering for protein scoring
- Moved FilterType enum from TaskLayer to EngineLayer for broader accessibility
- Added BestPeptidePEP property to ProteinGroup class to support PEP-based protein sorting
- Updated test expectations to reflect new protein group counts with PEP filtering
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| MetaMorpheus/EngineLayer/FilterType.cs | Moved FilterType enum from TaskLayer to EngineLayer with documentation |
| MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs | Added FilterType/FilterThreshold support and filter-aware PSM threshold checking |
| MetaMorpheus/EngineLayer/ProteinParsimony/ProteinParsimonyEngine.cs | Removed hardcoded 0.01 Q-value filter from parsimony PSM selection |
| MetaMorpheus/EngineLayer/ProteinParsimony/ProteinGroup.cs | Added BestPeptidePEP property and documentation for AllPsmsBelowOnePercentFDR |
| MetaMorpheus/TaskLayer/FilteredPsms.cs | Removed FilterType enum definition (moved to EngineLayer) |
| MetaMorpheus/TaskLayer/SearchTask/PostSearchAnalysisTask.cs | Passed FilterType and FilterThreshold to protein engines, updated protein results text formatting |
| MetaMorpheus/TaskLayer/GlycoSearchTask/PostGlycoSearchAnalysisTask.cs | Applied external filtering to glyco protein analysis PSMs |
| MetaMorpheus/Test/SearchTaskTest.cs | Updated test expectations for consistent filter display and added comprehensive documentation |
| MetaMorpheus/Test/PostSearchAnalysisTaskTests.cs | Updated protein group count expectations (140→145) for PEP filtering |
| MetaMorpheus/Test/ProteinGroupTest.cs | Updated test strings to include new BestPeptidePEP column |
Comments suppressed due to low confidence (2)
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:80
- This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
foreach (var psm in psmList)
{
// Use filter-type-aware threshold check
if (PsmPassesThreshold(psm))
{
if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
{
foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
{
if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
else
psmsForThisPeptide.Add(psm);
}
}
}
}
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:79
- These 'if' statements can be combined.
if (PsmPassesThreshold(psm))
{
if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
{
foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
{
if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
else
psmsForThisPeptide.Add(psm);
}
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
MetaMorpheus/TaskLayer/GlycoSearchTask/PostGlycoSearchAnalysisTask.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:80
- This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
foreach (var psm in psmList)
{
// Use filter-type-aware threshold check
if (PsmPassesThreshold(psm))
{
if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
{
foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
{
if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
else
psmsForThisPeptide.Add(psm);
}
}
}
}
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:79
- These 'if' statements can be combined.
if (PsmPassesThreshold(psm))
{
if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
{
foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
{
if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
else
psmsForThisPeptide.Add(psm);
}
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (3)
MetaMorpheus/TaskLayer/GlycoSearchTask/PostGlycoSearchAnalysisTask.cs:338
- This line still contains hardcoded 0.01 FDR filtering, which contradicts the PR's goal of eliminating internal PSM filters. The
GlycoAccessionAnalysismethod should use the centralizedFilteredPsms.Filterapproach likeGlycoProteinAnalysisdoes to ensure consistent filtering based on the configured filter type and threshold.
List<SpectralMatch> _filteredPsms = psmsForProteinParsimony.Where(p => p.FullSequence != null && p.FdrInfo.QValue <= 0.01 && p.FdrInfo.QValueNotch <= 0.01).ToList();
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:80
- This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
foreach (var psm in psmList)
{
// Use filter-type-aware threshold check
if (PsmPassesThreshold(psm))
{
if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
{
foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
{
if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
else
psmsForThisPeptide.Add(psm);
}
}
}
}
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:79
- These 'if' statements can be combined.
if (PsmPassesThreshold(psm))
{
if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
{
foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
{
if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
else
psmsForThisPeptide.Add(psm);
}
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs
Outdated
Show resolved
Hide resolved
| MergeIndistinguishableProteinGroups = mergeIndistinguishableProteinGroups; | ||
| _decoyIdentifiers = proteinGroups.SelectMany(p => p.Proteins.Where(b => b.IsDecoy).Select(b => b.Accession.Split('_')[0])).ToHashSet(); | ||
| _filterType = filterType; | ||
| _filterThreshold = filterThreshold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The information about the filterThreshold and the filterType can be found in CommonParameters. Could we pull that information from the commonParameters argument? Or is there a situation where we would want the ProteinScoringAndFdrEngine to use different filtering criteria than those defined in commonParams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| FilterType.PepQValue => psm.FdrInfo.PEP_QValue <= _filterThreshold, | ||
| _ => psm.FdrInfo.QValueNotch <= _filterThreshold && psm.FdrInfo.QValue <= _filterThreshold | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use the FilteredPsms class for filtering? I like the idea of having all of our filtering logic in one place, as having multiple filtering functions has caused issues in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be done
Alexander-Sol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move the FilteredPsms class to the engine layer. This would allow the parsimony and proteinScoring engines to use the filtering logic implemented there, instead of having to re-implement.
Also, double check the call to Filter in the GlycoProteinAnalysis.
…ysisEngine with FilteredPsms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| foreach (var psm in psmList) | ||
| { | ||
| if (psm.FdrInfo.QValueNotch <= 0.01 && psm.FdrInfo.QValue <= 0.01) | ||
| // Use filter-type-aware threshold check | ||
|
|
||
| if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null)) | ||
| { | ||
| if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null)) | ||
| foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer)) | ||
| { | ||
| foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer)) | ||
| { | ||
| if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide)) | ||
| peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm }); | ||
| else | ||
| psmsForThisPeptide.Add(psm); | ||
| } | ||
| if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide)) | ||
| peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm }); | ||
| else | ||
| psmsForThisPeptide.Add(psm); | ||
| } | ||
| } | ||
| } |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.

This pull request introduces several improvements and refactorings to the protein FDR analysis and parsimony pipeline, focusing on making FDR filtering more flexible and clear, as well as improving the reporting of protein group metrics. The main changes include the introduction of a
FilterTypeenum to distinguish between classic q-value and PEP-based filtering, propagation of filter settings throughout the relevant engines, and enhancements to protein group output to include PEP statistics. Several legacy assumptions (such as hardcoded 1% FDR) have been removed in favor of parameterized filtering.FDR Filtering and Propagation Enhancements:
FilterTypeenum inEngineLayer.SpectrumMatchto specify whether q-value or PEP-based filtering is used for PSM, peptide, and protein FDR calculations. (MetaMorpheus/EngineLayer/SpectrumMatch/FilterType.cs)ProteinParsimonyEngineandProteinScoringAndFdrEngineto accept and propagateFilteredPsmsobjects, ensuring consistent use of filter type and threshold throughout the pipeline. (MetaMorpheus/EngineLayer/ProteinParsimony/ProteinParsimonyEngine.cs,MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs) [1] [2] [3] [4]MetaMorpheus/EngineLayer/FdrAnalysis/FdrAnalysisEngine.cs)Protein Group Metric and Output Improvements:
BestPeptidePEPtoProteinGroup, tracked and reported alongside existing metrics, and included it in tab-separated output. (MetaMorpheus/EngineLayer/ProteinParsimony/ProteinGroup.cs) [1] [2] [3]ProteinGroup.AllPsmsBelowOnePercentFDRto clarify that the actual threshold and filter type are configurable and not always 1%. (MetaMorpheus/EngineLayer/ProteinParsimony/ProteinGroup.cs)Sorting and FDR Assignment Logic:
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs) [1] [2] [3]Codebase Cleanup:
MetaMorpheus/EngineLayer/FdrAnalysis/FdrAnalysisEngine.cs,MetaMorpheus/EngineLayer/SpectrumMatch/FilteredPsms.cs) [1] [2] [3]These changes collectively make the FDR analysis pipeline more robust, flexible, and transparent, supporting both classic and PEP-based filtering and improving the interpretability of protein group outputs.