feat: rectilinear chunks in Zarr backend#11279
feat: rectilinear chunks in Zarr backend#11279maxrjones wants to merge 8 commits intopydata:mainfrom
Conversation
|
Is this a duplicate of #10880? |
This would supersede #10880. It implements the same feature, but using a different upstream implementation (zarr-developers/zarr-python#3802), which will likely be merged into Zarr-Python in the coming days. zarr-developers/zarr-python#3802 supersedes zarr-developers/zarr-python#3369, which #10880 was built on top. |
There was a problem hiding this comment.
I'll need to look into this a bit more, but for now:
is skipped for rectilinear (tuple-of-tuples) chunks since those checks assume uniform chunk sizes.
That's what the current checks do, but their purpose is to support safely appending data without write conflicts between execution workers (dask / cubed / etc). Do we maybe need different checks that verify that zarr chunks do not overlap with multiple execution chunks?
| if any(len(set(chunks[:-1])) > 1 for chunks in var_chunks): | ||
| raise ValueError( | ||
| "Zarr requires uniform chunk sizes except for final chunk. " | ||
| "Zarr v2 requires uniform chunk sizes except for final chunk. " |
There was a problem hiding this comment.
| "Zarr v2 requires uniform chunk sizes except for final chunk. " | |
| "Zarr v2 requires uniform chunk sizes except for the final chunk. " |
| f"than the first. Variable named {name!r} has incompatible Dask chunks {var_chunks!r}." | ||
| "Consider either rechunking using `chunk()` or instead deleting " | ||
| "or modifying `encoding['chunks']`." | ||
| "Final chunk of a Zarr v2 array or a Zarr v3 array without the " |
There was a problem hiding this comment.
| "Final chunk of a Zarr v2 array or a Zarr v3 array without the " | |
| "The final chunk of a Zarr v2 array or a Zarr v3 array without the " |
Description
This PR accompanies zarr-developers/zarr-python#3802, adding support for rectilinear zarr chunks in Xarray.
The user-facing difference between this PR and zarr-developers/zarr-python#3369 / #10880 is that rectilinear chunks are gated behind
zarr.config.set({'array.rectilinear_chunks': True})(orZARR_ARRAY__RECTILINEAR_CHUNKS=True), disabled by default. This gives zarr-python developers an opportunity to gracefully finalize the API, which is especially valuable given that rectilinear chunks are the largest feature addition in zarr-python since Zarr V3/sharding.What changed
_determine_zarr_chunksnow passes through variable (non-uniform) chunk sizes when writing to Zarr V3 with the unified ChunkGrid API, instead of raising an error.safe_chunksandalign_chunksvalidation is skipped for rectilinear (tuple-of-tuples) chunks, since those checks assume uniform chunk sizes.To-do
Checklist
whats-new.rstapi.rstAI Disclosure