Skip to content

Verify HSP* and Update RegressTest to run under Pandas 3.0.0 #209

@rburghol

Description

@rburghol

Relevant to PRs #216 which achieved preliminary successful model runs under pandas 3.0.0, and eliminates deprecated pandas methods and argument type restrictions.

  • tests/convert/regression_base.py under pandas 3.0.0 fails with an error pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
  • RegressTest class imports HDF5 from src/hsp2/hsp2tools/HDF5.py whereas main.py imports HDF5 from src/hsp2hsp2io/hdf.py
    • Need to explore whether the hsp2io:HDF5:read_ts() and hsp2tools:HDF5:get_time_series() behave the same way
    • Need to understand if these need to be converged (recall a previous planning group meeting indicated this as important). @timcera
  • Verify: Closing the hdf5 at the end of the run eliminates the pytest error: ValueError: The file '/opt/model/HSPsquared/tests/test10/HSPFresults/test10.h5' is already opened, but not in read-only mode (as requested). in pandas >= 3.0.0
  • Why does HSP2 VOL timeseries have the misaligned timestamp? (see Table 1)

Misaligned SEDTRN timeseries produces pandas 3.0.0 test error.

  • The error that is thrown in pandas-3.0.0 tests -- either: pandas-2.3.3 does not produce this misalignment, or the indices have shifted from a simple sequential integer to a timestamp.
  • The time index im the hsp2 series is really wacked out
  • The test code that fails looks to find hsp2 timesteps where the flow volume is < 1.0e-4, and then sets the corresponding SSED values to 0.0. However, then it goes on to also try to set the same timeslots to 0.0 in the HSPF SSED timeseries. This causes an error because the timestamp indices are misaligned.
            idx_low_vol = ts_vol_hsp2 < 1.0e-4
            ts_hsp2.loc[idx_low_vol] = ts_hsp2.loc[idx_low_vol] = 0
            ts_hspf.loc[idx_low_vol] = ts_hspf.loc[idx_low_vol] = 0
  • The fact that the test TRIES this is an error IMO because I think we should not be trying to zero the HSPF SSED on days when the hsp2 has miniscule flow since this could paper over actual differences in the SSED simulation. I think it would be correct to set HSPF SSED to 0.0 when HSPF volume is < 1.0e-4.

Table 1: Timeseries output from hsp2 versus hspf showing a timestamp disagreement.

>>> ts_hsp2
index
1970-01-03 04:35:06.000    29.916803
1970-01-03 04:35:09.600    29.834137
1970-01-03 04:35:13.200    29.751753
                             ...
1970-01-03 13:21:57.600     0.000000
1970-01-03 13:22:01.200     0.000000
1970-01-03 13:22:04.800     0.000000
Name: SSED3, Length: 8784, dtype: float32
>>> ts_hspf
1976-01-01 01:00:00    29.916805
1976-01-01 02:00:00    29.834139
1976-01-01 03:00:00    29.751757
                         ...
1976-12-31 22:00:00     0.000000
1976-12-31 23:00:00     0.000000
1977-01-01 00:00:00     0.000000
Name: SSEDCLAY, Length: 8784, dtype: float64

Table 2: Excerpt of error text from https://github.com/respec/HSPsquared/actions/runs/21381288537/job/61548511129?pr=207

tests/test_regression.py:49: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tests/convert/regression_base.py:225: in check_con
    ts_hsp2, ts_hspf = self.validate_time_series(
tests/convert/regression_base.py:329: in validate_time_series
    ts_hspf.loc[idx_low_vol] = ts_hspf.loc[idx_low_vol] = 0
    ^^^^^^^^^^^^^^^^^^^^^^^^

...
E               pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/indexing.py:2677: IndexingError

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions