Skip to content

Develop summary write up #12

@asteiker

Description

@asteiker

Include the following components:

  • Summary of previous work
    • Why working with HDF5 in the cloud is complicated
      • Brief History of HDF
      • Latency in the cloud
      • Python drivers and IO bounded tasks
  • Working with HDF5 in the cloud using open source tools
    • GDAL
    • H5Py
    • H5Coro
    • Kerchunk
  • Performance considerations for HDF5 in the cloud and paged aggregation of nested metadata
    • HDF5 and nested metadata
    • H5Repack and chunk sizes
    • Simple benckmarking
  • Potential Cloud Optimized Formats for HDF5 datasets, the ATL03 case.
    • Raster data and N-dimensional: clear path forward
    • Point Cloud and hybrid datasets: ?
      • Zarr
      • Columnar formats for point cloud data: GeoParquet/Arrow
      • Cloud Native Data formats for PCD: https://copc.io/
    • lessons learned
  • Benchmarking results
  • Downstream processing pipeline considerations
  • Target audience:
    • ESDIS, DAAC management
    • ICESat-2 Science Team
    • Consider providing to NISAR community
  • ATL14/15 (COG)? Provide recommendations to help conversion in future release?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions