Skip to content

Conversation

@ecomodeller
Copy link
Member

@ecomodeller ecomodeller commented Dec 4, 2025

Problem

Fixes #157

When using the binsize parameter in gridded_skill() without explicitly defining bin edges, the auto-generated bins don't cover the full spatial range of the data. This causes data points at the boundaries to be excluded from all bins, resulting in data loss.

Example

With track data spanning from latitude 31.4° to 65.62°, using binsize=0.25 without explicit bin edges causes approximately 4° of data (~400 km of satellite data) to be excluded.

Solution

Modified _add_spatial_grid_to_df() in src/modelskill/comparison/_utils.py to:

  1. Calculate bin edges that cover the full data range (from min to max)
  2. Align bins to multiples of binsize relative to the mean (maintains consistency)
  3. Ensure no data points are excluded at the boundaries

Key Changes

  • Replaced center-based bin calculation with min/max-based calculation
  • Bins now start at or before the minimum data value
  • Bins now extend to or beyond the maximum data value
  • All 532 data points now included (previously 17 points were excluded)

Testing

  • ✅ New test test_gridded_skill_binsize_no_data_loss verifies all data points are included
  • ✅ Updated existing test expectation (5 bins instead of 4 - needed to cover full range)
  • ✅ All gridded_skill tests pass
  • ✅ All track comparison tests pass

Commits

  1. Add failing test demonstrating the bug (17 points lost)
  2. Implement fix and update test expectations

The test demonstrates that when using the binsize parameter without
explicit bin edges, some data points at the spatial boundaries are
excluded from the bins, resulting in data loss.
- Modified bin calculation to ensure bins cover full data range (min to max)
- Bins are aligned to multiples of binsize relative to the mean for consistency
- Updated test expectation: now correctly creates 5 bins instead of 4
- All 532 data points are now included (previously 17 points were excluded)
@ecomodeller ecomodeller marked this pull request as ready for review December 4, 2025 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem with bin size in spatial_skill

2 participants