Skip to content

Add structuretoparquet: JSON Structure to Parquet conversion #48

@clemensv

Description

@clemensv

Overview

Implement conversion from JSON Structure schemas to Apache Parquet schema.

Requirements

This conversion should:

  1. Lean on the corresponding Avro conversion (avrotoparquet) as precedent for output structure, including use of Jinja templates where applicable
  2. Cover the full breadth of the JSON Structure Core spec as defined in draft-vasters-json-structure-core-00
  3. Follow the patterns established by structuretocsharp and structuretopython, including their continued support for Avro schemas

Implementation Guidance

  • Review avrotize/avrotoparquet.py for output patterns and template usage
  • Review avrotize/structuretocsharp.py and avrotize/structuretopython.py for the JSON Structure handling patterns
  • Ensure all JSON Structure Core types are supported:
    • JSON Primitive Types: string, number, boolean, null
    • Extended Primitive Types: binary, int8-128, uint8-128, float8/float/double, decimal, date, datetime, time, duration, uuid, uri, jsonpointer
    • Compound Types: object, array, set, map, tuple, any, choice (both tagged and inline unions)
  • Support JSON Structure-specific features:
    • Namespaces and definitions
    • Type references ($ref)
    • Extensions ($extends) and add-ins ($offers/$uses)
    • Abstract types
    • Required/optional properties
    • Type annotations (maxLength, precision, scale, contentEncoding, etc.)

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions