Skip to content

Conversation

@trishorts
Copy link
Contributor

@trishorts trishorts commented Jan 12, 2026

This pull request introduces several improvements and refactorings to the protein FDR analysis and parsimony pipeline, focusing on making FDR filtering more flexible and clear, as well as improving the reporting of protein group metrics. The main changes include the introduction of a FilterType enum to distinguish between classic q-value and PEP-based filtering, propagation of filter settings throughout the relevant engines, and enhancements to protein group output to include PEP statistics. Several legacy assumptions (such as hardcoded 1% FDR) have been removed in favor of parameterized filtering.

FDR Filtering and Propagation Enhancements:

  • Introduced the FilterType enum in EngineLayer.SpectrumMatch to specify whether q-value or PEP-based filtering is used for PSM, peptide, and protein FDR calculations. (MetaMorpheus/EngineLayer/SpectrumMatch/FilterType.cs)
  • Refactored constructors and internal logic in ProteinParsimonyEngine and ProteinScoringAndFdrEngine to accept and propagate FilteredPsms objects, ensuring consistent use of filter type and threshold throughout the pipeline. (MetaMorpheus/EngineLayer/ProteinParsimony/ProteinParsimonyEngine.cs, MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs) [1] [2] [3] [4]
  • Updated FDR analysis logic to use the correct subset of PSMs according to the selected filter type and threshold, rather than always assuming a 1% FDR cutoff. (MetaMorpheus/EngineLayer/FdrAnalysis/FdrAnalysisEngine.cs)

Protein Group Metric and Output Improvements:

  • Added a new property BestPeptidePEP to ProteinGroup, tracked and reported alongside existing metrics, and included it in tab-separated output. (MetaMorpheus/EngineLayer/ProteinParsimony/ProteinGroup.cs) [1] [2] [3]
  • Enhanced the XML documentation for ProteinGroup.AllPsmsBelowOnePercentFDR to clarify that the actual threshold and filter type are configurable and not always 1%. (MetaMorpheus/EngineLayer/ProteinParsimony/ProteinGroup.cs)

Sorting and FDR Assignment Logic:

  • Refactored protein group sorting and FDR assignment to use the filter type dynamically, so protein groups are ranked by best peptide q-value or PEP as appropriate. (MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs) [1] [2] [3]

Codebase Cleanup:

  • Removed unused or redundant using statements and improved code clarity in several files. (MetaMorpheus/EngineLayer/FdrAnalysis/FdrAnalysisEngine.cs, MetaMorpheus/EngineLayer/SpectrumMatch/FilteredPsms.cs) [1] [2] [3]

These changes collectively make the FDR analysis pipeline more robust, flexible, and transparent, supporting both classic and PEP-based filtering and improving the interpretability of protein group outputs.

@codecov
Copy link

codecov bot commented Jan 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.72%. Comparing base (b3c331d) to head (997b040).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2603      +/-   ##
==========================================
+ Coverage   94.70%   94.72%   +0.02%     
==========================================
  Files         191      191              
  Lines       19956    20008      +52     
  Branches     3701     3701              
==========================================
+ Hits        18899    18953      +54     
  Misses        579      579              
+ Partials      478      476       -2     
Flag Coverage Δ
unittests 94.72% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...pheus/EngineLayer/FdrAnalysis/FdrAnalysisEngine.cs 95.69% <100.00%> (ø)
...pheus/EngineLayer/ProteinParsimony/ProteinGroup.cs 95.76% <100.00%> (+0.04%) ⬆️
...neLayer/ProteinParsimony/ProteinParsimonyEngine.cs 99.19% <100.00%> (+0.40%) ⬆️
...ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs 98.07% <100.00%> (+0.27%) ⬆️
...Morpheus/EngineLayer/SpectrumMatch/FilteredPsms.cs 98.33% <100.00%> (ø)
...yer/GlycoSearchTask/PostGlycoSearchAnalysisTask.cs 91.18% <100.00%> (+0.37%) ⬆️
...us/TaskLayer/MbrAnalysis/SpectralRecoveryRunner.cs 83.87% <ø> (ø)
MetaMorpheus/TaskLayer/MetaMorpheusTask.cs 86.34% <ø> (ø)
...eus/TaskLayer/SearchTask/PostSearchAnalysisTask.cs 90.98% <100.00%> (+0.33%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request eliminates the internal PSM filter (qValue < 0.01) previously used in protein inference, allowing the protein analysis to respect externally configured filters. This enables protein inference to work with both Q-value and PEP Q-value filtering consistently.

Changes:

  • Removed hardcoded 0.01 Q-value filter from ProteinParsimonyEngine, relying instead on pre-filtered PSM lists
  • Added FilterType and FilterThreshold parameters to ProteinScoringAndFdrEngine to support PEP Q-value filtering for protein scoring
  • Moved FilterType enum from TaskLayer to EngineLayer for broader accessibility
  • Added BestPeptidePEP property to ProteinGroup class to support PEP-based protein sorting
  • Updated test expectations to reflect new protein group counts with PEP filtering

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
MetaMorpheus/EngineLayer/FilterType.cs Moved FilterType enum from TaskLayer to EngineLayer with documentation
MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs Added FilterType/FilterThreshold support and filter-aware PSM threshold checking
MetaMorpheus/EngineLayer/ProteinParsimony/ProteinParsimonyEngine.cs Removed hardcoded 0.01 Q-value filter from parsimony PSM selection
MetaMorpheus/EngineLayer/ProteinParsimony/ProteinGroup.cs Added BestPeptidePEP property and documentation for AllPsmsBelowOnePercentFDR
MetaMorpheus/TaskLayer/FilteredPsms.cs Removed FilterType enum definition (moved to EngineLayer)
MetaMorpheus/TaskLayer/SearchTask/PostSearchAnalysisTask.cs Passed FilterType and FilterThreshold to protein engines, updated protein results text formatting
MetaMorpheus/TaskLayer/GlycoSearchTask/PostGlycoSearchAnalysisTask.cs Applied external filtering to glyco protein analysis PSMs
MetaMorpheus/Test/SearchTaskTest.cs Updated test expectations for consistent filter display and added comprehensive documentation
MetaMorpheus/Test/PostSearchAnalysisTaskTests.cs Updated protein group count expectations (140→145) for PEP filtering
MetaMorpheus/Test/ProteinGroupTest.cs Updated test strings to include new BestPeptidePEP column
Comments suppressed due to low confidence (2)

MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:80

            foreach (var psm in psmList)
            {
                // Use filter-type-aware threshold check
                if (PsmPassesThreshold(psm))
                {
                    if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
                    {
                        foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
                        {
                            if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
                                peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
                            else
                                psmsForThisPeptide.Add(psm);
                        }
                    }
                }
            }

MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:79

  • These 'if' statements can be combined.
                if (PsmPassesThreshold(psm))
                {
                    if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
                    {
                        foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
                        {
                            if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
                                peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
                            else
                                psmsForThisPeptide.Add(psm);
                        }
                    }
                }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:80

            foreach (var psm in psmList)
            {
                // Use filter-type-aware threshold check
                if (PsmPassesThreshold(psm))
                {
                    if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
                    {
                        foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
                        {
                            if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
                                peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
                            else
                                psmsForThisPeptide.Add(psm);
                        }
                    }
                }
            }

MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:79

  • These 'if' statements can be combined.
                if (PsmPassesThreshold(psm))
                {
                    if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
                    {
                        foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
                        {
                            if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
                                peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
                            else
                                psmsForThisPeptide.Add(psm);
                        }
                    }
                }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (3)

MetaMorpheus/TaskLayer/GlycoSearchTask/PostGlycoSearchAnalysisTask.cs:338

  • This line still contains hardcoded 0.01 FDR filtering, which contradicts the PR's goal of eliminating internal PSM filters. The GlycoAccessionAnalysis method should use the centralized FilteredPsms.Filter approach like GlycoProteinAnalysis does to ensure consistent filtering based on the configured filter type and threshold.
            List<SpectralMatch>  _filteredPsms = psmsForProteinParsimony.Where(p => p.FullSequence != null && p.FdrInfo.QValue <= 0.01 && p.FdrInfo.QValueNotch <= 0.01).ToList();

MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:80

            foreach (var psm in psmList)
            {
                // Use filter-type-aware threshold check
                if (PsmPassesThreshold(psm))
                {
                    if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
                    {
                        foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
                        {
                            if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
                                peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
                            else
                                psmsForThisPeptide.Add(psm);
                        }
                    }
                }
            }

MetaMorpheus/EngineLayer/ProteinScoringAndFdr/ProteinScoringAndFdrEngine.cs:79

  • These 'if' statements can be combined.
                if (PsmPassesThreshold(psm))
                {
                    if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
                    {
                        foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
                        {
                            if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
                                peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
                            else
                                psmsForThisPeptide.Add(psm);
                        }
                    }
                }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MergeIndistinguishableProteinGroups = mergeIndistinguishableProteinGroups;
_decoyIdentifiers = proteinGroups.SelectMany(p => p.Proteins.Where(b => b.IsDecoy).Select(b => b.Accession.Split('_')[0])).ToHashSet();
_filterType = filterType;
_filterThreshold = filterThreshold;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The information about the filterThreshold and the filterType can be found in CommonParameters. Could we pull that information from the commonParameters argument? Or is there a situation where we would want the ProteinScoringAndFdrEngine to use different filtering criteria than those defined in commonParams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of. but it might get changed if the psms count is low. And, maybe there are enough psms when we use all the files to filter w/ pep but when we right the individual file results, one of the files could have < 100 psms and then we need to change the FilterType and threshold.
image

FilterType.PepQValue => psm.FdrInfo.PEP_QValue <= _filterThreshold,
_ => psm.FdrInfo.QValueNotch <= _filterThreshold && psm.FdrInfo.QValue <= _filterThreshold
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the FilteredPsms class for filtering? I like the idea of having all of our filtering logic in one place, as having multiple filtering functions has caused issues in the past

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be done

Copy link
Contributor

@Alexander-Sol Alexander-Sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move the FilteredPsms class to the engine layer. This would allow the parsimony and proteinScoring engines to use the filtering logic implemented there, instead of having to re-implement.

Also, double check the call to Filter in the GlycoProteinAnalysis.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 54 to 68
foreach (var psm in psmList)
{
if (psm.FdrInfo.QValueNotch <= 0.01 && psm.FdrInfo.QValue <= 0.01)
// Use filter-type-aware threshold check

if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
{
if ((TreatModPeptidesAsDifferentPeptides && psm.FullSequence != null) || (!TreatModPeptidesAsDifferentPeptides && psm.BaseSequence != null))
foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
{
foreach (var pepWithSetMods in psm.BestMatchingBioPolymersWithSetMods.Select(p => p.SpecificBioPolymer))
{
if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
else
psmsForThisPeptide.Add(psm);
}
if (!peptideToPsmMatching.TryGetValue(pepWithSetMods, out HashSet<SpectralMatch> psmsForThisPeptide))
peptideToPsmMatching.Add(pepWithSetMods, new HashSet<SpectralMatch> { psm });
else
psmsForThisPeptide.Add(psm);
}
}
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.

Copilot uses AI. Check for mistakes.
Alexander-Sol
Alexander-Sol previously approved these changes Jan 20, 2026
@trishorts trishorts merged commit e7a072c into smith-chem-wisc:master Jan 23, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants