I've been happily using feather the last year as a backend with volume metadata on the files.
But parquet is smaller than feather, and also supports volume and chunk-level metadata.
So rather than storing metadata in a separate .json file, I propose storing the data in the footer of the parquet file, and caching it when needed with pyarrow.parquet.read_metadata.
Reason:
- Fewer files. The big one for me--taking up 2x as many inodes is a problem in an HPC environment.
- More portable
Downsides:
- Requires a pyarrow dependency for parquet, rather than flexibility about using fastparquet instead.
- Requires either breaking exist pyarrow installations, or some term-unlimited multiple format support.