Develop benchmarking criteria for consistent comparison across format options

Candidate criteria:

- Formats / chunking schemes to compare
  - Re-chunked HDF5 
  - Cloud-optimized HDF5
  - Geoparquet
  - Zarr
  - Kerchunk json
  - h5coro
- Environment
  - CryoCloud - Small instance
  - Assume we'll store all example files in CryoCloud (i.e. Sync or shared_public)
- Libraries or clients used to open/read data
- For each format option:
  - Dataset(s)
    - Based on community feedback/discussion, initial focus on ATL03
  - Files
    - Single and multiple? Files can vary by several GBs ; optimally produce and test 10 files
  - Variable(s)
  - Spatial subset(s)
  - Temporal subset(s)
  - Aggregation
  - End-to-end wall clock time
    - Time to re-chunk or reformat
    - Time to open/read file
      - Multiple tools/libraries/clients to compare per format option?
        - Geopandas, xarray
        - Should we consider dask data frame
  - Compute cost
  - Do we include a real-world example?
    - Time series of 60 day repeat cycle
    - Real world example tie in: Jacobshavn surface height 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop benchmarking criteria for consistent comparison across format options #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Develop benchmarking criteria for consistent comparison across format options #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions