Skip to content

Conversation

@shraddhabang
Copy link
Collaborator

@shraddhabang shraddhabang commented Nov 6, 2025

Description

This PR introduces support for AGA deployer allowing Kubernetes users to manage Global Accelerator resources through Kubernetes custom resources. I have divided this PR into smaller commits to make it easier to review.

Here is the brief description on each of them

  1. Setup AGA SDK Client (a39933f)
    Adds AWS Global Accelerator SDK integration with the controller. Implements client interfaces, mock implementations, and cloud provider extensions to interact with the Global Accelerator service. Updates dependencies to include required AWS SDK packages for Global Accelerator.

  2. Add AGA CRD Status Updates Utils (c34f2eb)
    Implements status updating utilities for Global Accelerator CRDs. Provides mechanisms to reflect the state of AWS Global Accelerator resources back to Kubernetes custom resources, enabling users to track resource status, lifecycle events, and error conditions directly in Kubernetes.

  3. Add AGA Tags Reconciler (9484b80)
    Implements tagging functionality for Global Accelerator resources to maintain proper resource ownership and tracking. Provides interfaces and implementations for applying, updating, and reconciling tags on AWS Global Accelerator resources with comprehensive test coverage. Please note I have moved tracking tags to deployer instead of builders. Also I have added one more region specific tags so that multiple controller in same account in different region do not override each other

  4. Add AGA Controller Config Flags (9695137)
    I have disabled AGAController for now until its ready for GA. We will make this enabled by default when we are ready to release this for prod. We have also added exponential back-off duration flags for AGA controller which customer can configure. I wanna also know if we should make requeue time configurable as well.

  5. Add AGA Deployer (47a7bec)
    Implements the core deployment logic for Global Accelerator resources. This includes:

    • Accelerator Manager for coordinating resource lifecycle operations
    • Synthesizer for transforming Kubernetes model objects into AWS resources
    • Stack deployer for managing resource dependencies and deployment ordering
    • Error handling and lifecycle management for Global Accelerator resources
  6. update controller-gen version(8dfe522)
    Updating the controller-gen version to use v0.19.0
    To fix the failing CRD verifications in CI/CD

TODO: I will add e2e tests once we build all the resources separately as this was already getting very big PR.
TODO: I have not added IAM policies. I wanna add when we are done with the upcoming release.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shraddhabang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 6, 2025
return nil, err
}

// Add tracking tags (includes cluster tag and stack tag)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding these tracking tags in deployer phase now. So dont need these here.

}

// ARN not in cache, need to fetch from RGT API
tags, err := m.describeResourceTagsFromRGT(ctx, arn)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AGAController, we will need RGT APIs to be enabled by default so using these here as first check and fallback to AGA API in case of failures.

// Status constants
StatusDeployed = "DEPLOYED"
StatusInProgress = "IN_PROGRESS"
StatusDeleting = "Deleting"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistentcy, should this be StatusDeleting = "DELETING"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AGA doesn't have a service-side construct of deleting - once it's deleted, it's gone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for displaying on CRD while its being deleted since it may take some time for accelerator to be disabled. So till that time, the CRD resource will be in deleting status.

}

// Set status to "Deleting" to indicate it's in the process of being deleted
if ga.Status.Status == nil || *ga.Status.Status != StatusDeleting {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the intent here : if status isn't already 'Deleting', then set it to 'Deleting ?

API seems like does not have 'Delete' status. Should the CRD match API ?

Status -> (string)

Describes the deployment status of the accelerator.

Possible values:

DEPLOYED
IN_PROGRESS

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically want to show deleting status on our resource instead of In_progess because its much more clear. Deletion through the controller is two step process(Disable and Delete which can take some time) which we handle for them. And disabling can take upto 5 minutes. So they need to differentiate between create/updates/delete calls.

}

// Update dual stack DNS name if available
if accelerator.Status.DualStackDNSName != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a lot of places. We are just setting status. For example, set DualStackDNSName here. Do we also need to clear DualStackDNSName in some case ?

I checked how ingress update status. They are replace entire arrays of status, which automatically clear .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes we need to clear this.

r.logger.Error(updateErr, "Failed to update status during accelerator deletion")
}
// Requeue after 30 seconds to check again
return ctrlerrors.NewRequeueNeededAfter("Waiting for accelerator to be disabled", 30*time.Second)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a pattern for exponential backoff available? Some kind of ramp up would be good if we have a pattern for it already because most updates are fully applied within a minute or two, but sometimes changes get delayed for longer. Some kind of exponential or planned backoff would provide a better (faster) experience when changes are flowing at the usual rate.

acceleratorManager := r.stackDeployer.GetAcceleratorManager()
r.logger.Info("Deleting accelerator", "acceleratorARN", acceleratorARN, "globalAccelerator", k8s.NamespacedName(ga))

// Create accelerator with ARN to delete

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) A different phrasing than "create accelerator" might be good here, since we aren't creating an accelerator in the AWS-resource sense.


// Step 1: Try to disable the accelerator first if it's enabled
m.logger.Info("Disabling accelerator before deletion", "acceleratorARN", acceleratorARN)
isAlreadyDeleted, err := m.disableAccelerator(ctx, acceleratorARN)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid a second describe call in disableAccelerator if we pass in the sdkAccelerator here? (or is it stale and in need of refreshing?)

// Status constants
StatusDeployed = "DEPLOYED"
StatusInProgress = "IN_PROGRESS"
StatusDeleting = "Deleting"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AGA doesn't have a service-side construct of deleting - once it's deleted, it's gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants