Skip to content

Conversation

@stephprince
Copy link
Contributor

@stephprince stephprince commented May 29, 2024

Motivation

Addresses several issues summarized in #1808. This is a breaking change for the next major release.

This PR also modifies the validate method so that it

  1. accepts a single path as input (the CLI still accepts multiple paths)
  2. no longer returns a status code but will return errors if the function fails in the process of performing validation. (The CLI still returns an exit code).

TODO

  • test with nwbinspector (as of now should only require changes to this line)
  • check test coverage
  • finish testing with ZarrIO with file path validation (bump to later PR)
  • when publishing the next release on conda-forge, the recipe file should also be updated to add the pynwb-validate entry point

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@codecov
Copy link

codecov bot commented May 29, 2024

Codecov Report

Attention: Patch coverage is 96.77419% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.69%. Comparing base (e47cd5a) to head (146d0e1).

Files with missing lines Patch % Lines
src/pynwb/__init__.py 78.57% 2 Missing and 1 partial ⚠️
src/pynwb/validation.py 98.55% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##           release-3.0.0    #1911      +/-   ##
=================================================
- Coverage          92.69%   92.69%   -0.01%     
=================================================
  Files                 27       28       +1     
  Lines               2684     2710      +26     
  Branches             706      709       +3     
=================================================
+ Hits                2488     2512      +24     
- Misses               127      128       +1     
- Partials              69       70       +1     
Flag Coverage Δ
integration 73.14% <96.77%> (+0.26%) ⬆️
unit 83.63% <44.35%> (-0.45%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rly rly added this to the 3.0 milestone Oct 10, 2024
test.py Outdated
try:
with pynwb.NWBHDF5IO(nwb, mode='r') as io:
errors = pynwb.validate(io)
errors = validate(io, use_cached_namespaces=False) # previously io did not validate against cached namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errors = validate(io, use_cached_namespaces=False) # previously io did not validate against cached namespaces
errors = validate(io, use_cached_namespaces=False),
errors.append(validate(io, use_cached_namespaces=True))
  1. I think the original comment here might not make sense in the future
  2. The example NWB files are all generated using the version of NWB being tested, and because pynwb caches the spec by default, there should be no difference between using cached namespaces and not.
  3. Since the pynwb.validate and pynwb-validate should be the same now, we don't really need this test anymore since we have the pynwb-validate test below. But since this validate_nwbs() function is super conservative in its testing of every combination, then for consistency, I suggest we validate with both use_cached_namespaces=True and use_cached_namespaces=False.

Copy link
Contributor Author

@stephprince stephprince Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set use_cached_namespaces=True here, I get several errors when running the validate_nwb() section of test.py. I had added use_cached_namespaces=False so that it matched the previous test behavior, but if it is expected that there should be no difference between using the cached namespaces and not in this particular case, then maybe these errors are indicative of another issue?

You can replicate by running test.py, but the errors look like this below. Maybe the mylab extension generation needs to be updated?:

2024-12-23 10:12:02,969 - INFO - Validating with pynwb.validate method.
Traceback (most recent call last):
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/validate/validator.py", line 280, in get_validator
    return self.__validators[dt]
           ~~~~~~~~~~~~~~~~~^^^^
KeyError: 'NWBFile'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/smprince/Documents/code/pynwb/test.py", line 172, in validate_nwbs
    errors = validate(io, use_cached_namespaces=True)  # previously io did not validate against cached namespaces
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/utils.py", line 672, in func_call
    return func(**pargs)
           ^^^^^^^^^^^^^
  File "/Users/smprince/Documents/code/pynwb/src/pynwb/validation.py", line 191, in validate
    validation_errors += _validate_helper(io=io, namespace=validation_namespace)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/Documents/code/pynwb/src/pynwb/validation.py", line 19, in _validate_helper
    return validator.validate(builder)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/validate/validator.py", line 298, in validate
    validator = self.get_validator(dt)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/validate/validator.py", line 283, in get_validator
    raise ValueError(msg)
ValueError: data type 'NWBFile' not found in namespace mylab

Copy link
Contributor Author

@stephprince stephprince Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these lines:

ns_builder.include_type("ElectricalSeries", namespace="core")

ns_builder.include_type("NWBDataInterface", namespace="core")

could be updated to:
ns_builder.include_namespace("core")

to fix these errors. I think these changes more closely match the latest version of ndx-template create_extension_spec.py file. However, should ns_builder.include_type still work without including the entire namespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. For now, let's update that to ns_builder.include_namespace("core").

We need to think through what it means to run the pynwb validator on a file, validating against a particular non-core namespace. Should the core namespace always be included during validation, regardless of whether the extension includes the core namespace, and the choices are either use the core namespace cached in the file or the one installed with pynwb? I think so...

I think we eventually want to move toward not even allowing validation against a particular namespace. Either the file is valid given its cached namespaces (or the namespaces installed by pynwb and loaded by the user), or the file is not. Otherwise, we run into weird issues with such as:
hdmf-dev/hdmf#608 and hdmf-dev/hdmf#525

I would say let's make the above change for now, and iterate on these other ideas in a separate PR which does not need to make it in pynwb 3.0.0rc1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the core namespace always be included during validation, regardless of whether the extension includes the core namespace, and the choices are either use the core namespace cached in the file or the one installed with pynwb? I think so...

I think so too? But agreed it would definitely be helpful to compile and discuss all these different scenarios to get a better idea of the end goal validation behavior

I would say let's make the above change for now, and iterate on these other ideas in a separate PR which does not need to make it in pynwb 3.0.0rc1

Sounds good!

@stephprince stephprince changed the base branch from dev to release-3.0.0 January 2, 2025 17:36
@stephprince stephprince requested a review from rly January 2, 2025 21:25
@stephprince stephprince merged commit f02e61b into release-3.0.0 Jan 3, 2025
24 of 25 checks passed
@stephprince stephprince deleted the upgrade-validator branch January 3, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants