Description Candidate criteria:
Formats / chunking schemes to compare
Re-chunked HDF5
Cloud-optimized HDF5
Geoparquet
Zarr
Kerchunk json
h5coro
Environment
CryoCloud - Small instance
Assume we'll store all example files in CryoCloud (i.e. Sync or shared_public)
Libraries or clients used to open/read data
For each format option:
Dataset(s)
Based on community feedback/discussion, initial focus on ATL03
Files
Single and multiple? Files can vary by several GBs ; optimally produce and test 10 files
Variable(s)
Spatial subset(s)
Temporal subset(s)
Aggregation
End-to-end wall clock time
Time to re-chunk or reformat
Time to open/read file
Multiple tools/libraries/clients to compare per format option?
Geopandas, xarray
Should we consider dask data frame
Compute cost
Do we include a real-world example?
Time series of 60 day repeat cycle
Real world example tie in: Jacobshavn surface height
Reactions are currently unavailable
You can’t perform that action at this time.