Releases · otsaloma/dataiter

Silence warnings about writing NPZ files with StringDType: "UserWarning: Custom dtypes are saved as python objects using the pickle protocol. Loading this file requires allow_pickle=True to be set."
Dataiter can now be considered stable. If upgrading from <= 0.51, please read the release notes for 0.99–0.9999.

Assets 2

12 Jan 19:32

otsaloma

0.9999

851d961

0.9999

New module dataiter.regex for vectorized regular expressions
Add proxy object Vector.dt for dataiter.dt
Add proxy object Vector.re for dataiter.regex
Add proxy object Vector.str for numpy.strings
Use PyArrow instead of Pandas to read and write CSV files
Replace Pandas dependency with PyArrow

This is likely to be a breaking change in some rare weirdly formatted
CSV files that Pandas and PyArrow might parse differently, resulting in
something like diffently guessed data types or differently detected
missing value markers. The note about stability below release 0.99 still
applies.

Assets 2

15 Dec 14:54

otsaloma

0.999

472f147

0.999

DataFrame.fom_arrow: Remove strings_as_object argument
DataFrame.from_pandas: Remove strings_as_object argument
DataFrame.read_csv: Remove strings_as_object argument
DataFrame.read_parquet: Remove strings_as_object argument
GeoJSON.read: Remove strings_as_object argument
ListOfDicts.to_data_frame: Remove strings_as_object argument
read_csv: Remove strings_as_object argument
read_geojson: Remove strings_as_object argument
read_parquet: Remove strings_as_object argument
Vector.as_string: Remove length argument
Vector.is_na: Fix to work in multidimensional cases where the elements of an object vector are arrays/vectors
Vector.rank: Change default method to "min"
Vector.rank: Remove method "average"

This is a breaking change to switch the string data type from the
fixed-width str_ a.k.a. <U# to the variable-width StringDType
introduced in NumPy 2.0. The main benefit is greatly reduced memory use,
making strings usable without needing to be careful or falling back to
object. The note about stability below release 0.99 still applies.

Note that as StringDType is only in NumPy >= 2.0, any NPZ or Pickle
files saved cannot be opened using Dataiter < 0.99 and NumPy < 2.0. If
you need that kind of interoperability, consider using the Parquet file
format.

Assets 2

17 Aug 17:56

otsaloma

0.99

6d473ed

0.99

Adapt to changes in NumPy 2.0
Bump NumPy dependency to >= 2.0

This is a minimal change to be NumPy 2.0 compatible. In the 0.99+
releases, we plan to adopt the new NumPy string dtype and fix any
regressions that come up, leading to a 1.0 release when everything looks
to be working reliably (#26). Anyone looking for extreme stability
should consider avoiding the 0.99+ releases and waiting for 1.0.

Assets 2

24 Jun 19:56

otsaloma

0.51

e8a5d62

0.51

Mark NumPy dependency as < 2.0

Assets 2

05 Apr 21:41

otsaloma

0.50

d78775e

0.50

ListOfDicts.drop_na: New method
ListOfDicts.keys: New method
ListOfDicts.print_memory_use: New method
Fix tabular display of Unicode characters with width != 1
Add dependency on wcwidth: https://pypi.org/project/wcwidth

Assets 2

08 Nov 19:29

otsaloma

0.49

a562911

0.49

dt: Handle all NaT input
Migrate from setup.py to hatch and pyproject.toml

Assets 2

08 Oct 15:44

otsaloma

0.48

222f630

0.48

Vector.as_datetime: Add precision argument
Vector.concat: New method
Vector.sort: Fix sorting object vectors

Assets 2

Releases: otsaloma/dataiter

1.2

Uh oh!

1.1

Uh oh!

1.0

Uh oh!

0.9999

Uh oh!

0.999

Uh oh!

0.99

Uh oh!

0.51

Uh oh!

0.50

Uh oh!

0.49

Uh oh!

0.48

Uh oh!