Releases: otsaloma/dataiter
1.2
1.1
1.0
- Silence warnings about writing NPZ files with StringDType: "UserWarning: Custom dtypes are saved as python objects using the pickle protocol. Loading this file requires allow_pickle=True to be set."
Dataiter can now be considered stable. If upgrading from <= 0.51, please read the release notes for 0.99–0.9999.
0.9999
- New module
dataiter.regexfor vectorized regular expressions - Add proxy object
Vector.dtfordataiter.dt - Add proxy object
Vector.refordataiter.regex - Add proxy object
Vector.strfornumpy.strings - Use PyArrow instead of Pandas to read and write CSV files
- Replace Pandas dependency with PyArrow
This is likely to be a breaking change in some rare weirdly formatted
CSV files that Pandas and PyArrow might parse differently, resulting in
something like diffently guessed data types or differently detected
missing value markers. The note about stability below release 0.99 still
applies.
0.999
DataFrame.fom_arrow: Removestrings_as_objectargumentDataFrame.from_pandas: Removestrings_as_objectargumentDataFrame.read_csv: Removestrings_as_objectargumentDataFrame.read_parquet: Removestrings_as_objectargumentGeoJSON.read: Removestrings_as_objectargumentListOfDicts.to_data_frame: Removestrings_as_objectargumentread_csv: Removestrings_as_objectargumentread_geojson: Removestrings_as_objectargumentread_parquet: Removestrings_as_objectargumentVector.as_string: RemovelengthargumentVector.is_na: Fix to work in multidimensional cases where the elements of an object vector are arrays/vectorsVector.rank: Change defaultmethodto "min"Vector.rank: Removemethod"average"
This is a breaking change to switch the string data type from the
fixed-width str_ a.k.a. <U# to the variable-width StringDType
introduced in NumPy 2.0. The main benefit is greatly reduced memory use,
making strings usable without needing to be careful or falling back to
object. The note about stability below release 0.99 still applies.
Note that as StringDType is only in NumPy >= 2.0, any NPZ or Pickle
files saved cannot be opened using Dataiter < 0.99 and NumPy < 2.0. If
you need that kind of interoperability, consider using the Parquet file
format.
0.99
- Adapt to changes in NumPy 2.0
- Bump NumPy dependency to >= 2.0
This is a minimal change to be NumPy 2.0 compatible. In the 0.99+
releases, we plan to adopt the new NumPy string dtype and fix any
regressions that come up, leading to a 1.0 release when everything looks
to be working reliably (#26). Anyone looking for extreme stability
should consider avoiding the 0.99+ releases and waiting for 1.0.
0.51
0.50
ListOfDicts.drop_na: New methodListOfDicts.keys: New methodListOfDicts.print_memory_use: New method- Fix tabular display of Unicode characters with width != 1
- Add dependency on wcwidth: https://pypi.org/project/wcwidth