Skip to content

Conversation

@Randomidous
Copy link

PR Description

Following a discussion here and there, this PR implements mne's reject_by_annotation to exclude BAD-annotated time segments from channel quality assessment. Bad segments (e.g., participant movement during breaks) are only temporarily excluded from statistical analysis while preserving the original continuous data structure in the output.

Note:

Merge Checklist

  • the PR has been reviewed and all comments are resolved
  • all CI checks pass
  • (if applicable): the PR description includes the phrase closes #<issue-number> to automatically close an issue
  • (if applicable): the changes are documented in the changelog changelog.rst
  • (if applicable): new contributors have added themselves to the authors list in the CITATION.cff file

@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.73%. Comparing base (34d4773) to head (c01217e).

Files with missing lines Patch % Lines
pyprep/find_noisy_channels.py 85.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #180      +/-   ##
==========================================
- Coverage   98.09%   97.73%   -0.36%     
==========================================
  Files           7        7              
  Lines         734      752      +18     
==========================================
+ Hits          720      735      +15     
- Misses         14       17       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Randomidous
Copy link
Author

Randomidous commented Jan 30, 2026

Regarding continuity concerns:

  1. Filtering happens BEFORE segment extraction (lines 98-101):
if do_detrend:
    self.raw_mne._data = removeTrend(
        self.raw_mne.get_data(), self.sample_rate, matlab_strict=matlab_strict
    )
  1. The high-pass detrending is applied to the full continuous raw_mne object first. Then at line 160-163, segments are
    extracted with reject_by_annotation. This avoids filter edge artifacts at concatenation boundaries.
  2. Different use case: The documentation correctly notes this is for excluding known-bad segments (e.g., movement
    artifacts during breaks) from statistical analysis, not for handling actual recording discontinuities. The data
    structure remains continuous.

Remaining Concerns:

  1. Bandpass filtering in _get_filtered_data() (lines 171-190): This 1-50 Hz filter is applied to self.EEGData, which
    is already the concatenated data. If a BAD annotation spans a significant portion of the recording, the concatenation
    boundaries could produce filter artifacts.
  2. Correlation windows spanning boundaries: If a 1-second correlation window happens to straddle where two clean
    segments were concatenated, the correlation could be affected by the artificial discontinuity (though the impact is
    likely minimal if the detrending was already done).
  3. RANSAC: Signal prediction operates on the concatenated filtered data, potentially affected by the above.

The MATLAB PREP concerns are mostly addressed for the intended use case (excluding known-bad segments like movement
artifacts), because:

  • The primary filtering (detrending) happens on continuous data before extraction
  • You're not actually concatenating separate recordings with true discontinuities
  • The remaining filtering on concatenated data uses relatively narrow-band filters that are less prone to severe edge
    artifacts

However, there's a theoretical edge case where the bandpass filtering on concatenated data could introduce minor
artifacts. This would only be significant if:

  • Large portions of data are marked BAD
  • Multiple small clean segments get concatenated together

^^^
This could be tested.

For typical use (excluding a few isolated bad segments), this should not materially affect channel quality assessment.

Comment on lines +337 to +339
# Without rejection, sample count should be unchanged
nd_no_reject = NoisyChannels(raw_tmp, do_detrend=False, reject_by_annotation=None)
assert nd_no_reject.n_samples == original_samples
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one pretty much covers test_reject_by_annotation_default from above, right? So I vote to remove test_reject_by_annotation_default

Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work @Randomidous !

Few comments apart from the inline one(s).

  1. We should explain what we mean by "bad" segments by linking to https://mne.tools/stable/auto_tutorials/preprocessing/20_rejecting_bad_data.html#the-reject-by-annotation-parameter in the docs via something like
See the MNE tutorial on rejecting data spans:
:ref:`mne:tut-reject-data-spans`.

(this should work due to interphinx)

And in the docstrings we can be more brief and say that it's any annotation starting with BAD or bad

  1. Thanks for your "caveats" comment. I think we should add a "test" to pyprep that warns a user IF (i) reject_by_annotation is not None AND (ii) the bad segments are "plenty", i.e., not a few "long" bad segments (block breaks), but potentially loads of small bad segments (where somebody tried to filter out e.g., muscle activations via https://mne.tools/stable/generated/mne.preprocessing.annotate_muscle_zscore.html#mne.preprocessing.annotate_muscle_zscore)

Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more point I forgot: I don't think we need the "NaN" option 🤔 in fact, I think it's unnecessarily complicating this. "Omit" should be sufficient.

Oh and of course let's test this a little before merging it. I suggest you run it on your dataset that you are currently working on and compare the performance / sanity WITH and without reject_by_annotation.


This parameter has no equivalent in MATLAB PREP and is not affected by the
``matlab_strict`` setting.
This parameter has no equivalent in MATLAB PREP. When ``matlab_strict`` is set to ``True``, ``reject_by_annotation`` is automatically set to ``None``.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to say, but this will of course need code adjustments 👼

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way ahead of you 😁

Comment on lines +124 to +125
"reject_by_annotation is not available in MATLAB PREP. "
f"Setting reject_by_annotation to None (was '{reject_by_annotation}')."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"reject_by_annotation is not available in MATLAB PREP. "
f"Setting reject_by_annotation to None (was '{reject_by_annotation}')."
"reject_by_annotation is not available in MATLAB PREP. "
f"Setting reject_by_annotation to None (was '{reject_by_annotation}'). "
"To use reject_by_annotation, matlab_strict must be set to False."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds more like an error message than a warning, aye?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature suggestion] Allow for relevant annotation selection during processing. Add a "reject_by_annotation" parameter for finding bad channels

2 participants