Skip to content

feat: rectilinear chunks in Zarr backend#11279

Open
maxrjones wants to merge 8 commits intopydata:mainfrom
maxrjones:poc/unified-zarr-chunk-grid
Open

feat: rectilinear chunks in Zarr backend#11279
maxrjones wants to merge 8 commits intopydata:mainfrom
maxrjones:poc/unified-zarr-chunk-grid

Conversation

@maxrjones
Copy link
Copy Markdown
Contributor

Description

This PR accompanies zarr-developers/zarr-python#3802, adding support for rectilinear zarr chunks in Xarray.

The user-facing difference between this PR and zarr-developers/zarr-python#3369 / #10880 is that rectilinear chunks are gated behind zarr.config.set({'array.rectilinear_chunks': True}) (or ZARR_ARRAY__RECTILINEAR_CHUNKS=True), disabled by default. This gives zarr-python developers an opportunity to gracefully finalize the API, which is especially valuable given that rectilinear chunks are the largest feature addition in zarr-python since Zarr V3/sharding.

What changed

  • _determine_zarr_chunks now passes through variable (non-uniform) chunk sizes when writing to Zarr V3 with the unified ChunkGrid API, instead of raising an error.
  • Reading correctly reconstructs chunk information from both RegularChunkGrid and RectilinearChunkGrid metadata.
  • safe_chunks and align_chunks validation is skipped for rectilinear (tuple-of-tuples) chunks, since those checks assume uniform chunk sizes.
  • Error messages for chunk validation failures now distinguish between Zarr V2 and V3 and point users toward the rectilinear chunks extension.

To-do

  • expand test coverage for error messages when using V2 or config flag is off, and a multi-dimensional test case
  • decide whether to continue silently bypassing safe_chunks/align_chunks or add validations
  • remove upstream version pin

Checklist

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR. Tools: Claude Code

@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library io labels Apr 2, 2026
@headtr1ck
Copy link
Copy Markdown
Collaborator

Is this a duplicate of #10880?

@maxrjones
Copy link
Copy Markdown
Contributor Author

Is this a duplicate of #10880?

This would supersede #10880. It implements the same feature, but using a different upstream implementation (zarr-developers/zarr-python#3802), which will likely be merged into Zarr-Python in the coming days. zarr-developers/zarr-python#3802 supersedes zarr-developers/zarr-python#3369, which #10880 was built on top.

Copy link
Copy Markdown
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to look into this a bit more, but for now:

is skipped for rectilinear (tuple-of-tuples) chunks since those checks assume uniform chunk sizes.

That's what the current checks do, but their purpose is to support safely appending data without write conflicts between execution workers (dask / cubed / etc). Do we maybe need different checks that verify that zarr chunks do not overlap with multiple execution chunks?

if any(len(set(chunks[:-1])) > 1 for chunks in var_chunks):
raise ValueError(
"Zarr requires uniform chunk sizes except for final chunk. "
"Zarr v2 requires uniform chunk sizes except for final chunk. "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Zarr v2 requires uniform chunk sizes except for final chunk. "
"Zarr v2 requires uniform chunk sizes except for the final chunk. "

f"than the first. Variable named {name!r} has incompatible Dask chunks {var_chunks!r}."
"Consider either rechunking using `chunk()` or instead deleting "
"or modifying `encoding['chunks']`."
"Final chunk of a Zarr v2 array or a Zarr v3 array without the "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Final chunk of a Zarr v2 array or a Zarr v3 array without the "
"The final chunk of a Zarr v2 array or a Zarr v3 array without the "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

io topic-backends topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants