-
-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Describe the Feature
Add first-class support for Apache Iceberg table format in the Terraform AWS Glue module. The feature should make it possible to create, manage, and configure AWS Glue resources—including Glue jobs, crawlers, and catalogs—that interact with Iceberg tables on S3, utilizing relevant settings for optimal performance and compatibility.
Expected Behavior
Users should be able to provision all relevant AWS Glue resources pre-configured to work seamlessly with Iceberg tables. The module should support specifying Iceberg table properties and offer any required options for catalog integration. Glue jobs created via this module should be able to read, write, and manage Iceberg datasets from S3 without extra manual configuration.
Use Case
Adoption of the Iceberg table format is increasing due to its advanced features for big data workloads, such as schema evolution, partitioning, transactional support, and time travel. Currently, users must apply extensive custom configuration to use Glue with Iceberg, which is error-prone and inconsistent. This feature will reduce onboarding complexity and help teams leverage Glue and Iceberg together out of the box.
Describe Ideal Solution
- Support configuring Glue Catalogs and Databases for Iceberg table storage
- Allow jobs/crawlers to interact with Iceberg tables using required JARs and settings (e.g., enabling 'TABLE_TYPE=ICEBERG')
- Expose variables to specify Iceberg table properties
- Document all relevant options and provide examples for using the module with Iceberg tables
- Ensure compatibility with the latest AWS Glue versions supporting Iceberg
- Handle required IAM permissions and resource dependencies automatically
- (Optional) Provide guardrails/templates for common Iceberg workflows (partitioning, snapshot management, etc.)
Alternatives Considered
- Relying on manual configuration of Glue jobs/resources to support Iceberg (complex, error-prone)
- Using separate Terraform modules to bridge Glue and Iceberg, which increases maintenance overhead
- Direct management of Iceberg tables without Terraform (does not scale for infrastructure as code needs)
Additional Context
- Refer to AWS official documentation for Glue-Iceberg integration: https://docs.aws.amazon.com/glue/latest/dg/iceberg.html
- This feature aligns with AWS recommendations for building modern data lakes
- Example projects/modules that implement similar support:
- Consider providing compatibility notes for AWS Glue versions and limitations (such as partition evolution or concurrent writes)
- This feature will help data engineering/data lake workloads benefit from Iceberg's features via infrastructure as code