Skip to content

Releases: otsaloma/dataiter

1.2

05 Oct 18:11
1.2

Choose a tag to compare

  • DataFrame.count: Avoid name clash if column n already exists
  • dt.isoformat: New function
  • mode: Fix Numba variant on some recent versions of Numba

1.1

10 Jun 20:15
1.1

Choose a tag to compare

1.1
  • DataFrame.pivot_longer: New method
  • DataFrame.pivot_wider: New method

1.0

07 Feb 20:06
1.0

Choose a tag to compare

1.0
  • Silence warnings about writing NPZ files with StringDType: "UserWarning: Custom dtypes are saved as python objects using the pickle protocol. Loading this file requires allow_pickle=True to be set."
    Dataiter can now be considered stable. If upgrading from <= 0.51, please read the release notes for 0.99–0.9999.

0.9999

12 Jan 19:32

Choose a tag to compare

  • New module dataiter.regex for vectorized regular expressions
  • Add proxy object Vector.dt for dataiter.dt
  • Add proxy object Vector.re for dataiter.regex
  • Add proxy object Vector.str for numpy.strings
  • Use PyArrow instead of Pandas to read and write CSV files
  • Replace Pandas dependency with PyArrow

This is likely to be a breaking change in some rare weirdly formatted
CSV files that Pandas and PyArrow might parse differently, resulting in
something like diffently guessed data types or differently detected
missing value markers. The note about stability below release 0.99 still
applies.

0.999

15 Dec 14:54
0.999

Choose a tag to compare

  • DataFrame.fom_arrow: Remove strings_as_object argument
  • DataFrame.from_pandas: Remove strings_as_object argument
  • DataFrame.read_csv: Remove strings_as_object argument
  • DataFrame.read_parquet: Remove strings_as_object argument
  • GeoJSON.read: Remove strings_as_object argument
  • ListOfDicts.to_data_frame: Remove strings_as_object argument
  • read_csv: Remove strings_as_object argument
  • read_geojson: Remove strings_as_object argument
  • read_parquet: Remove strings_as_object argument
  • Vector.as_string: Remove length argument
  • Vector.is_na: Fix to work in multidimensional cases where the elements of an object vector are arrays/vectors
  • Vector.rank: Change default method to "min"
  • Vector.rank: Remove method "average"

This is a breaking change to switch the string data type from the
fixed-width str_ a.k.a. <U# to the variable-width StringDType
introduced in NumPy 2.0. The main benefit is greatly reduced memory use,
making strings usable without needing to be careful or falling back to
object. The note about stability below release 0.99 still applies.

Note that as StringDType is only in NumPy >= 2.0, any NPZ or Pickle
files saved cannot be opened using Dataiter < 0.99 and NumPy < 2.0. If
you need that kind of interoperability, consider using the Parquet file
format.

0.99

17 Aug 17:56
0.99

Choose a tag to compare

  • Adapt to changes in NumPy 2.0
  • Bump NumPy dependency to >= 2.0

This is a minimal change to be NumPy 2.0 compatible. In the 0.99+
releases, we plan to adopt the new NumPy string dtype and fix any
regressions that come up, leading to a 1.0 release when everything looks
to be working reliably (#26). Anyone looking for extreme stability
should consider avoiding the 0.99+ releases and waiting for 1.0.

0.51

24 Jun 19:56
0.51

Choose a tag to compare

  • Mark NumPy dependency as < 2.0

0.50

05 Apr 21:41
0.50

Choose a tag to compare

  • ListOfDicts.drop_na: New method
  • ListOfDicts.keys: New method
  • ListOfDicts.print_memory_use: New method
  • Fix tabular display of Unicode characters with width != 1
  • Add dependency on wcwidth: https://pypi.org/project/wcwidth

0.49

08 Nov 19:29
0.49

Choose a tag to compare

  • dt: Handle all NaT input
  • Migrate from setup.py to hatch and pyproject.toml

0.48

08 Oct 15:44
0.48

Choose a tag to compare

  • Vector.as_datetime: Add precision argument
  • Vector.concat: New method
  • Vector.sort: Fix sorting object vectors