Skip to content

feat: introducing AWS Glue schema registry discovery#243

Merged
Adrian Januzi (adrian-januzi) merged 18 commits intomainfrom
feat-gluesr
Apr 7, 2026
Merged

feat: introducing AWS Glue schema registry discovery#243
Adrian Januzi (adrian-januzi) merged 18 commits intomainfrom
feat-gluesr

Conversation

@adrian-januzi
Copy link
Copy Markdown
Contributor

@adrian-januzi Adrian Januzi (adrian-januzi) commented Mar 25, 2026

Description

Adds support for discovering and migrating schemas from AWS Glue Schema Registry to Confluent Cloud. This extends KCP's schema migration capabilities beyond Confluent Schema Registry to cover AWS-native schema registries.


Changes Made

  • Discovery: New scan glue-schema-registry command that discovers schemas and versions from an AWS Glue Schema Registry, persisting results to kcp-state.json
  • Terraform generation: Generates Confluent Cloud schema resources (confluent_schema) from discovered Glue schemas, including subject configs and schema files
  • State model: Extended SchemaRegistries state to support both Confluent SR and AWS Glue SR with backward-compatible JSON unmarshaling
  • UI: Updated Schema Registries explore view to display Glue registries alongside Confluent ones, with a new migration wizard for Glue-to-Confluent schema migration
  • Glue service: New AWS Glue client and service layer for listing registries, schemas, and schema versions
  • No breaking changes — existing Confluent SR workflows are unaffected

Testing

  • Unit tests for Glue schema registry scanner (3 test cases)
  • Unit tests for Glue schema registry service (discovery, version fetching)
  • Unit tests for Confluent schema HCL generation
  • Unit tests for state backward compatibility (old array format → new object format)
  • All linter checks pass

Test Instructions:

  1. Configure AWS credentials with Glue access
  2. Run kcp scan glue-schema-registry --registry-name <name> --region <region> --state-file kcp-state.json
  3. Verify schemas appear in kcp-state.json under schema_registries.aws_glue
  4. Run kcp ui and check the Schema Registries view shows Glue registries
  5. Use the migration wizard to generate Terraform for Glue → Confluent migration

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated
  • No breaking changes

@adrian-januzi Adrian Januzi (adrian-januzi) requested a review from a team as a code owner March 25, 2026 15:18
Combine `kcp scan schema-registry` and `kcp scan glue-schema-registry`
into a single `kcp scan schema-registry` command with a required
`--sr-type` flag (confluent | glue). This reduces CLI surface area and
establishes a consistent pattern for registry type selection.

- Add --sr-type flag with conditional flag validation in PreRunE
- Move glue scanner into schema_registry package (no cross-package imports)
- Normalize glue success output from slog.Info to fmt.Printf
- Remove glue-schema-registry as a standalone command
- Update help text with grouped flags by registry type

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix path traversal in HCL file generation by sanitizing schema names
  and validating output paths stay within target directory
- Fix potential nil panic in generateConfluentSchema by propagating errors
- Add bounded concurrency (10 workers) to GetAllSchemasWithVersions
- Thread context.Context through Glue service layer for cancellation
- Fix emoji in error message and use %w wrapping in glue client
- Add --schemas filter flag to migrate-schemas for CLI/UI parity
- Add user-facing success message for Glue migration generator
- Consolidate schema_registry_cluster_id into GlueSchemaVariables
- Extract appendVariableBlocks helper to reduce HCL generation duplication
- Use aws.ToString/aws.ToInt64 instead of manual nil checks
- Extract useToggleSet hook and memoize formatSchema in React UI
- Update docs to reflect unified scan schema-registry --sr-type command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adrian-januzi
Copy link
Copy Markdown
Contributor Author

Consideration: Schema version storage in state file

Currently GetAllSchemasWithVersions fetches the full SchemaDefinition string for every version of every schema and stores it inline in kcp-state.json. This is correct for migration — replaying all versions is needed to preserve compatibility level progression in Confluent Cloud — but it scales poorly. For a registry with 500 schemas and 20 versions averaging 5KB each, the state file grows to ~50MB.

A potential improvement would be to store schema definitions on disk separately (e.g., .kcp/schemas/<registry>/<name>/v<N>.avsc) rather than embedding them in the state JSON. The state file would store metadata only (version number, format, status, file path reference), keeping it lean while retaining all version data for migration replay. The HCL generator and UI would read definitions from disk instead of from the state object.

Not blocking for this PR, but worth considering before targeting large production registries.

@adrian-januzi
Copy link
Copy Markdown
Contributor Author

Having tested the above consideration at 20% the scale (103 schemas, ranging between 1-5 versions), the state file doesn't bloat as greatly as implied especially compared to the already existing cost and metrics data.

Show schema count and version totals during scan so the command
doesn't appear to hang. Uses standardised emoji set (🚀/🔍/✅).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Findings

Overall this is a well-structured PR — clean separation of concerns, good test coverage, and the backward-compatible UnmarshalJSON is nicely done. A few items worth addressing:


Critical

1. Uncommitted fix for MarkFlagRequired in PreRunE
cmd/scan/schema_registry/cmd_scan_schema_registry.go

There is an uncommitted diff in the working tree that replaces cmd.MarkFlagRequired("url") etc. inside preRunScanSchemaRegistry with explicit empty-string checks. The original code calls MarkFlagRequired inside PreRunE, which does not work as expected — cobra's MarkFlagRequired is designed to be called at command setup time (before Execute()), not inside PreRunE. When called in PreRunE, cobra has already parsed flags and will not retroactively enforce the requirement, meaning a missing --url flag would silently pass validation.

The uncommitted change (direct if url == "" { return fmt.Errorf(...) } checks) is the correct fix and should be committed.


Important

2. NewStateFrom does not copy SchemaRegistries
internal/types/state.go

NewStateFrom copies Regions but not SchemaRegistries. Any code path using NewStateFrom (e.g. cmd/discover/discoverer.go) will silently drop all schema registry data when a new discovery run starts. Worth adding:

workingState.SchemaRegistries = fromState.SchemaRegistries

3. No context cancellation on error in concurrent schema fetching
internal/services/glue_schema_registry/glue_schema_registry_service.go

GetAllSchemasWithVersions checks firstErr before starting each goroutine's work, but does not cancel the context on error. Already-started goroutines making AWS API calls will continue running after a failure. With 10 concurrent workers and many schemas, this wastes time and API calls. Fix with context.WithCancel:

ctx, cancel := context.WithCancel(ctx)
defer cancel()
// in the error path:
if firstErr == nil {
    firstErr = fmt.Errorf(...)
    cancel()
}

4. slog.Info in hot loop should be slog.Debug
internal/services/glue_schema_registry/glue_schema_registry_service.go:81

Per CLAUDE.md: slog.Info is "operational detail", slog.Debug is "verbose internal detail". The line slog.Info("fetching versions for schema", ...) runs once per schema inside a concurrent loop. For registries with hundreds of schemas this floods the log. Should be slog.Debug.

5. Glue registry lookup in parseMigrateGlueSchemasOpts ignores region
cmd/create_asset/migrate_schemas/cmd_create_asset_migrate_schemas.go

parseMigrateGlueSchemasOpts matches Glue registries only by RegistryName, but UpsertGlueSchemaRegistry matches by RegistryName + Region. If a user has the same registry name in two regions (valid — Glue names are per-account-per-region), the CLI always finds the first one. Consider adding a --region flag or matching on both fields.


Suggestions

  • formatSchemaCache (SchemaRegistries.tsx): Module-level Map is never evicted. Unlikely to matter for a single-session desktop tool, but worth noting.
  • CLI flag rename: --url to --cc-sr-rest-endpoint is a breaking change for automation. Call it out in release notes if intentional.
  • Latest pointer sharing: In the concurrent goroutine, latest = &versions[latestIdx] shares memory with the Versions slice. Consider v := versions[latestIdx]; latest = &v to avoid subtle mutation coupling.
  • Missing error-path test: GetAllSchemasWithVersions has no test for when getSchemaVersions fails for one schema. Would strengthen confidence in the concurrency pattern.

- Replace MarkFlagRequired in PreRunE with explicit empty-string checks
  (cobra ignores MarkFlagRequired after flag parsing)
- Copy SchemaRegistries in NewStateFrom to prevent data loss on re-discovery
- Add context cancellation on error in concurrent schema fetching
- Downgrade hot-loop slog.Info to slog.Debug per logging conventions
- Detect ambiguous Glue registry name across multiple regions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uation

When the same Glue registry name exists in multiple regions, users can
now pass --region to select the correct one. Without --region, the
command errors if ambiguous rather than silently picking the first match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@adrian-januzi Adrian Januzi (adrian-januzi) merged commit 83bc3f3 into main Apr 7, 2026
1 of 2 checks passed
@adrian-januzi Adrian Januzi (adrian-januzi) deleted the feat-gluesr branch April 7, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants