Skip to content

Commit 202efae

Browse files
Merge branch 'main' into gpu-codecs
2 parents 594ae65 + f4d1221 commit 202efae

File tree

11 files changed

+71
-17
lines changed

11 files changed

+71
-17
lines changed

changes/3695.bugfix.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Raise error when trying to encode :class:`numpy.dtypes.StringDType` with `na_object` set.

docs/contributing.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ The hooks can be installed locally by running:
131131
prek install
132132
```
133133

134-
This would run the checks every time a commit is created locally. The checks will by default only run on the files modified by a commit, but the checks can be triggered for all the files by running:
134+
This will run the checks every time a commit is created locally. The checks will by default only run on the files modified by a commit, but the checks can be triggered for all the files by running:
135135

136136
```bash
137137
prek run --all-files
@@ -249,33 +249,33 @@ Pull requests submitted by an external contributor should be reviewed and approv
249249

250250
Pull requests should not be merged until all CI checks have passed (GitHub Actions, Codecov) against code that has had the latest main merged in.
251251

252-
Before merging the milestone must be set either to decide whether a PR will be in the next patch, minor, or major release. The next section explains which types of changes go in each release.
252+
Before merging, the milestone must be set to decide whether a PR will be in the next patch, minor, or major release. The next section explains which types of changes go in each release.
253253

254254
## Compatibility and versioning policies
255255

256256
### Versioning
257257

258-
Versions of this library are identified by a triplet of integers with the form `<major>.<minor>.<patch>`, for example `3.0.4`. A release of `zarr-python` is associated with a new version identifier. That new identifier is generated by incrementing exactly one of the components of the previous version identifier by 1. When incrementing the `major` component of the version identifier, the `minor` and `patch` components is reset to 0. When incrementing the minor component, the patch component is reset to 0.
258+
Versions of this library are identified by a triplet of integers with the form `<major>.<minor>.<patch>`, for example `3.0.4`. A release of `zarr-python` is associated with a new version identifier. That new identifier is generated by incrementing exactly one of the components of the previous version identifier by 1. When incrementing the `major` component of the version identifier, the `minor` and `patch` components are reset to 0. When incrementing the minor component, the patch component is reset to 0.
259259

260260
Releases are classified by the library changes contained in that release. This classification determines which component of the version identifier is incremented on release.
261261

262262
* **major** releases (for example, `2.18.0` -> `3.0.0`) are for changes that will require extensive adaptation efforts from many users and downstream projects. For example, breaking changes to widely-used user-facing APIs should only be applied in a major release.
263263

264264
Users and downstream projects should carefully consider the impact of a major release before adopting it. In advance of a major release, developers should communicate the scope of the upcoming changes, and help users prepare for them.
265265

266-
* **minor** releases (for example, `3.0.0` -> `3.1.0`) are for changes that do not require significant effort from most users or downstream downstream projects to respond to. API changes are possible in minor releases if the burden on users imposed by those changes is sufficiently small.
266+
* **minor** releases (for example, `3.0.0` -> `3.1.0`) are for changes that do not require significant effort from most users or downstream projects to respond to. API changes are possible in minor releases if the burden on users imposed by those changes is sufficiently small.
267267

268268
For example, a recently released API may need fixes or refinements that are breaking, but low impact due to the recency of the feature. Such API changes are permitted in a minor release.
269269

270270
Minor releases are safe for most users and downstream projects to adopt.
271271

272272
* **patch** releases (for example, `3.1.0` -> `3.1.1`) are for changes that contain no breaking or behaviour changes for downstream projects or users. Examples of changes suitable for a patch release are bugfixes and documentation improvements.
273273

274-
Users should always feel safe upgrading to a the latest patch release.
274+
Users should always feel safe upgrading to the latest patch release.
275275

276276
Note that this versioning scheme is not consistent with [Semantic Versioning](https://semver.org/). Contrary to SemVer, the Zarr library may release breaking changes in `minor` releases, or even `patch` releases under exceptional circumstances. But we should strive to avoid doing so.
277277

278-
A better model for our versioning scheme is [Intended Effort Versioning](https://jacobtomlinson.dev/effver/), or "EffVer". The guiding principle off EffVer is to categorize releases based on the *expected effort required to upgrade to that release*.
278+
A better model for our versioning scheme is [Intended Effort Versioning](https://jacobtomlinson.dev/effver/), or "EffVer". The guiding principle of EffVer is to categorize releases based on the *expected effort required to upgrade to that release*.
279279

280280
Zarr developers should make changes as smooth as possible for users. This means making backwards-compatible changes wherever possible. When a backwards-incompatible change is necessary, users should be notified well in advance, e.g. via informative deprecation warnings.
281281

@@ -288,12 +288,12 @@ If an existing Zarr format version changes, or a new version of the Zarr format
288288
## Release procedure
289289

290290
Open an issue on GitHub announcing the release using the release checklist template:
291-
[https://github.com/zarr-developers/zarr-python/issues/new?template=release-checklist.md](https://github.com/zarr-developers/zarr-python/issues/new?template=release-checklist.md>). The release checklist includes all steps necessary for the release.
291+
[https://github.com/zarr-developers/zarr-python/issues/new?template=release-checklist.md](https://github.com/zarr-developers/zarr-python/issues/new?template=release-checklist.md). The release checklist includes all steps necessary for the release.
292292

293293
## Benchmarks
294294

295295
Zarr uses [pytest-benchmark](https://pytest-benchmark.readthedocs.io/en/latest/) for running
296-
performance benchmarks as part of our test suite. The benchmarks can be are found in `tests/benchmarks`.
296+
performance benchmarks as part of our test suite. The benchmarks are found in `tests/benchmarks`.
297297
By default pytest is configured to run these benchmarks as plain tests (i.e., no benchmarking). To run
298298
a benchmark with timing measurements, use the `--benchmark-enable` when invoking `pytest`.
299299

docs/quick-start.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
This section will help you get up and running with
1+
This section will help you get up and running with
22
the Zarr library in Python to efficiently manage and analyze multi-dimensional arrays.
33

44
### Creating an Array
@@ -92,7 +92,7 @@ spam[:] = np.arange(10)
9292
print(root.tree())
9393
```
9494

95-
This creates a group with two datasets: `foo` and `bar`.
95+
This creates a group hierarchy with a group (`foo`) and two arrays (`bar` and `spam`).
9696

9797
#### Batch Hierarchy Creation
9898

docs/user-guide/arrays.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ print(z[:, 0])
7272
print(z[:])
7373
```
7474

75-
Read more about NumPy-style indexing can be found in the
75+
More information about NumPy-style indexing can be found in the
7676
[NumPy documentation](https://numpy.org/doc/stable/user/basics.indexing.html).
7777

7878
## Persistent arrays
@@ -297,7 +297,7 @@ array without loading the entire array into memory.
297297
Note that although this functionality is similar to some of the advanced
298298
indexing capabilities available on NumPy arrays and on h5py datasets, **the Zarr
299299
API for advanced indexing is different from both NumPy and h5py**, so please
300-
read this section carefully. For a complete description of the indexing API,
300+
read this section carefully. For a complete description of the indexing API,
301301
see the documentation for the [`zarr.Array`][] class.
302302

303303
### Indexing with coordinate arrays

docs/user-guide/extending.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ of the array data. Examples include compression codecs, such as
2929

3030
Custom codecs for Zarr are implemented by subclassing the relevant base class, see
3131
[`zarr.abc.codec.ArrayArrayCodec`][], [`zarr.abc.codec.ArrayBytesCodec`][] and
32-
[`zarr.abc.codec.BytesBytesCodec`][]. Most custom codecs should implemented the
32+
[`zarr.abc.codec.BytesBytesCodec`][]. Most custom codecs should implement the
3333
`_encode_single` and `_decode_single` methods. These methods operate on single chunks
3434
of the array data. Alternatively, custom codecs can implement the `encode` and `decode`
3535
methods, which operate on batches of chunks, in case the codec is intended to implement

docs/user-guide/groups.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ root = zarr.create_group(store=store)
1313
print(root)
1414
```
1515

16-
Groups have a similar API to the Group class from [h5py](https://www.h5py.org/). For example, groups can contain other groups:
16+
Groups have a similar API to the Group class from [h5py](https://www.h5py.org/). For example, groups can contain other groups:
1717

1818
```python exec="true" session="groups" source="above"
1919
foo = root.create_group('foo')

docs/user-guide/storage.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,8 @@ print(group)
9191

9292
## Explicit Store Creation
9393

94-
In some cases, it may be helpful to create a store instance directly. Zarr-Python offers four
95-
built-in store: [`zarr.storage.LocalStore`][], [`zarr.storage.FsspecStore`][],
94+
In some cases, it may be helpful to create a store instance directly. Zarr-Python offers
95+
built-in stores: [`zarr.storage.LocalStore`][], [`zarr.storage.FsspecStore`][],
9696
[`zarr.storage.ZipStore`][], [`zarr.storage.MemoryStore`][], and [`zarr.storage.ObjectStore`][].
9797

9898
### Local Store

docs/user-guide/v3_migration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ so we can improve this guide.
2020

2121
The goals described above necessitated some breaking changes to the API (hence the
2222
major version update), but where possible we have maintained backwards compatibility
23-
in the most widely used parts of the API. This in the [`zarr.Array`][] and
23+
in the most widely used parts of the API. This includes the [`zarr.Array`][] and
2424
[`zarr.Group`][] classes and the "top-level API" (e.g. [`zarr.open_array`][] and
2525
[`zarr.open_group`][]).
2626

src/zarr/core/dtype/npy/string.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -742,6 +742,43 @@ class VariableLengthUTF8(UTF8Base[np.dtypes.StringDType]): # type: ignore[type-
742742

743743
dtype_cls = np.dtypes.StringDType
744744

745+
@classmethod
746+
def from_native_dtype(cls, dtype: TBaseDType) -> Self:
747+
"""
748+
Create an instance of this data type from a compatible NumPy data type.
749+
We reject NumPy StringDType instances that have the `na_object` field set,
750+
because this is not representable by the Zarr `string` data type.
751+
752+
Parameters
753+
----------
754+
dtype : TBaseDType
755+
The native data type.
756+
757+
Returns
758+
-------
759+
Self
760+
An instance of this data type.
761+
762+
Raises
763+
------
764+
DataTypeValidationError
765+
If the input is not compatible with this data type.
766+
ValueError
767+
If the input is `numpy.dtypes.StringDType` and has `na_object` set.
768+
"""
769+
if cls._check_native_dtype(dtype):
770+
if hasattr(dtype, "na_object"):
771+
msg = (
772+
f"Zarr data type resolution from {dtype} failed. "
773+
"Attempted to resolve a zarr data type from a `numpy.dtypes.StringDType` "
774+
"with `na_object` set, which is not supported."
775+
)
776+
raise ValueError(msg)
777+
return cls()
778+
raise DataTypeValidationError(
779+
f"Invalid data type: {dtype}. Expected an instance of {cls.dtype_cls}"
780+
)
781+
745782
def to_native_dtype(self) -> np.dtypes.StringDType:
746783
"""
747784
Create a NumPy string dtype from this VariableLengthUTF8 ZDType.

src/zarr/core/dtype/registry.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,10 @@ def match_dtype(self, dtype: TBaseDType) -> ZDType[TBaseDType, TBaseScalar]:
161161
raise ValueError(msg)
162162
matched: list[ZDType[TBaseDType, TBaseScalar]] = []
163163
for val in self.contents.values():
164+
# DataTypeValidationError means "this dtype doesn't match me", which is
165+
# expected and suppressed. Other exceptions (e.g. ValueError for a dtype
166+
# that matches the type but has an invalid configuration) are propagated
167+
# to the caller.
164168
with contextlib.suppress(DataTypeValidationError):
165169
matched.append(val.from_native_dtype(dtype))
166170
if len(matched) == 1:

0 commit comments

Comments
 (0)