Skip to content

Conversation

@Randomidous
Copy link

@Randomidous Randomidous commented Jan 29, 2026

PR Description

Building on @nabilalibou's work (#145), this PR adds functionality for finding bad channels using Power Spectral Density (PSD). This method is not part of the original MATLAB PREP pipeline and only runs when matlab_strict=False.

Detection Criteria

A channel is flagged as "bad-by-PSD" if either:

  1. Abnormally high band power: The channel has an outlier z-score (>3.0) in any of three frequency bands:
    • Low (1-15 Hz): delta, theta, alpha
    • Mid (15-30 Hz): beta
    • High (30-45 Hz): gamma
  2. 1/f violation: The high-frequency band has more power than the low-frequency band, which violates the typical 1/f spectral profile of EEG and often indicates muscle artifact or poor electrode contact.

Implementation Details

  • PSD is computed using Welch's method over 1-45 Hz (configurable via fmin/fmax)
  • Default frequency range excludes 50/60 Hz line noise
  • Uses MAD-based robust z-scoring (scaled by 1.4826 to convert to SD units)
  • Only flags positive z-scores (excess power), as abnormally low power may reflect normal topographic variation

Changes from previous PR (#145)

  1. Detects only high PSD (not low), since low power can be normal
  2. Uses MAD-based robust statistics instead of standard z-scoring
  3. Splits spectrum into three frequency bands instead of total power
  4. Adds 1/f violation criterion to catch muscle artifacts
  5. Includes bad_by_psd in the bad channels dictionary
  6. Fixes KeyError in Reference.robust_reference() by adding bad_by_psd to the noisy channels tracking dict

Merge Checklist

  • the PR has been reviewed and all comments are resolved
  • all CI checks pass
  • (if applicable): the PR description includes the phrase closes #<issue-number> to automatically close an issue
  • (if applicable): the changes are documented in the changelog changelog.rst
  • (if applicable): new contributors have added themselves to the authors list in the CITATION.cff file

Comment on lines 605 to 606
# Sum log PSD across frequencies for each channel to get total power
total_log_psd = np.sum(log_psd, axis=1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this step we lose some info. Imagine a channel that has an abnormally high PSD in low freqs, but lower than normal in high freqs --> the total power will then look like that of a channel that is medium in high and low 🤔

Copy link
Owner

@sappelhoff sappelhoff Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just as a rule of thumb, we could try dividing into 3 "band power" bins (and only within bins summing up to total log psd):

  1. 0-15Hz
  2. 15-30Hz
  3. 30-45Hz

a channel would be bad if the bin in 30-45Hz has more power than that of 0-15Hz.

a channel could also be bad if it is abnormal from other channels in any one of these three bins

a channel could also be bad if any ratio of bands is abnormal (1/2, 1/3, 2/3)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed third criterion

@Randomidous
Copy link
Author

After testing with some artifact-ladden and somewhat clean data, I figured that the threshs need to be adapted or some criteria (eyeing bad_by_ratio atm) removed.

See plots below:
Noisy data:
image

Less noisy data (note still 7 channels removed):
image

@Randomidous
Copy link
Author

Removing bad_by_ratio criterion:

What it did: For each channel, it computed the ratio of power between frequency bands (low/mid, low/high, mid/high), then flagged
channels where any of these ratios was unusual compared to other channels.

The rationale: The idea was to catch channels with abnormal spectral shape even if absolute power was normal. For example:

  • A channel with normal total power but unusually flat spectrum (similar power across bands)
  • A channel with an unusual bump in one band relative to others

Why it's problematic:

  1. There's natural variation in spectral shape across the scalp (frontal vs occipital channels have different alpha power, etc. Imagine doing workload analysis and using this method)
  2. Three separate ratio tests with OR logic means 3x the chance of false positives (I should have known better)
  3. If a channel passes bad_by_band (absolute power is normal in all bands), the spectral shape is probably fine too - ratio
    deviations at that point are likely just normal topographic variation

The remaining criteria are more robust:

  • bad_by_band: Catches genuinely abnormal power (too high or too low)
  • bad_by_1f_violation: Catches channels where high-freq > low-freq power, which violates the fundamental 1/f characteristic of EEG and indicates noise/artifact

@codecov
Copy link

codecov bot commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 98.07692% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 98.08%. Comparing base (34d4773) to head (1292494).

Files with missing lines Patch % Lines
pyprep/find_noisy_channels.py 98.07% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #179      +/-   ##
==========================================
- Coverage   98.09%   98.08%   -0.01%     
==========================================
  Files           7        7              
  Lines         734      785      +51     
==========================================
+ Hits          720      770      +50     
- Misses         14       15       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Randomidous
Copy link
Author

Without the ratio criterion, we still see a lot of channels being removed for noisy data. Some of these are clearly bad but others I'd rather keep in the dataset. So I will dig a bit deeper.

image

@Randomidous
Copy link
Author

Without the ratio criterion, we still see a lot of channels being removed for noisy data. Some of these are clearly bad but others I'd rather keep in the dataset. So I will dig a bit deeper.

I mean, if a cleaning method fails on noisy data, then what's the point 🤔

@Randomidous
Copy link
Author

Randomidous commented Jan 30, 2026

Considering this atm:

  • only mark union of bad channels?
  • raise z-score thresh (on which grounds?)
  • only consider positive z-scores? (since lower than average is not too sus)

@Randomidous
Copy link
Author

Randomidous commented Jan 30, 2026

Will check only flagging channels with abnormally HIGH power (positive z-scores). Excess power is more reliably indicative of problems (muscle artifact, noise, bad contact causing interference). Low power is ambiguous.
Getting closer and closer to the HF noise criterion.

@Randomidous
Copy link
Author

Randomidous commented Jan 30, 2026

This seems to work splendidly!

Below are bad channels only identified by bad_by_PSD in grey:
image

These are all bads by PSD (as compared to unique for PSD):
image

Running it on the same clean data from earlier only returns a single bad channel by PSD:
image

@Randomidous
Copy link
Author

The current implementation seems promising and will be tested on a complete dataset next week :)

Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work here @Randomidous looking forward to seeing the test results next week.

Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nabilalibou please feel free to have a look at this PR that is in the same vein of what you worked on in 2024.

Please also supply us with your details so we can add you to CITATION.cff, to give you some credit for getting this started at some point.

Follow this example, please:

    - given-names: Roy Eric
      family-names: Wieske
      affiliation: 'Biopsychology and Neuroergonomics, Technische Universität Berlin, Berlin, Germany'
      orcid: 'https://orcid.org/0009-0006-2018-1074'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants