Skip to content

Cannot save non-ASCII characters to NetCDF #5125

@trexfeathers

Description

@trexfeathers

🐛 Bug Report

From @gavinevans

Attempting to save a Cube including a string AuxCoord with non-ASCII characters (i.e. Unicode characters) raises the following exception:

UnicodeEncodeError: 'ascii' codec can't encode character '\xe8' in position 0: ordinal not in range(128)

How To Reproduce

Steps to reproduce the behaviour:

import iris
from iris.coords import AuxCoord, DimCoord
from iris.cube import Cube

spot_index = DimCoord([0, 1], long_name='site_index', units=1)

station_name = AuxCoord(["Robièi", "Mühleberg"], long_name="station_name")
# This one works:
# station_name = AuxCoord(["Robiei", "Muhleberg"], long_name="station_name")

cube = Cube(
    [3, 4],
    dim_coords_and_dims=[(spot_index, 0)],
    aux_coords_and_dims=[(station_name, 0)]
)

iris.save(cube, "tmp.nc")

Expected behaviour

Should save with no exception (as happens when using the commented line above).

Environment

  • OS & Version: RHEL7
  • Iris Version: tested with v3.2.1.post0 and v3.4.0

Additional context

Related:

I think the fix will hinge on allowing for the extra bytes needed to store encoded Unicode characters. We currently divide the length in 4, which I think means we are always assuming a Unicode string can be converted to an ASCII one:

string_dimension_depth = data.dtype.itemsize
if data.dtype.kind == "U":
string_dimension_depth //= 4

Changing this could have loading consequences too?

Expand for traceback with Iris v3.4
Traceback (most recent call last):
  File ".../iris/lib/2023-01-03_gavin.py", line 17, in <module>
    iris.save(cube, "tmp.nc")
  File ".../iris/lib/iris/io/__init__.py", line 457, in save
    saver(source, target, **kwargs)
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 2754, in save
    sman.write(
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 755, in write
    self._add_aux_coords(cube, cf_var_cube, cube_dimensions)
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1088, in _add_aux_coords
    return self._add_inner_related_vars(
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1053, in _add_inner_related_vars
    cf_name = self._create_generic_cf_array_var(
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1917, in _create_generic_cf_array_var
    new_data[index_slice] = list(
UnicodeEncodeError: 'ascii' codec can't encode character '\xe8' in position 0: ordinal not in range(128)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Status

    No status

    Status

    Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions