-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[feat aga] Add AGA accelerator deployer #4434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shraddhabang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
47a7bec to
ae22611
Compare
ae22611 to
8dfe522
Compare
| return nil, err | ||
| } | ||
|
|
||
| // Add tracking tags (includes cluster tag and stack tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are adding these tracking tags in deployer phase now. So dont need these here.
| } | ||
|
|
||
| // ARN not in cache, need to fetch from RGT API | ||
| tags, err := m.describeResourceTagsFromRGT(ctx, arn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For AGAController, we will need RGT APIs to be enabled by default so using these here as first check and fallback to AGA API in case of failures.
| // Status constants | ||
| StatusDeployed = "DEPLOYED" | ||
| StatusInProgress = "IN_PROGRESS" | ||
| StatusDeleting = "Deleting" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistentcy, should this be StatusDeleting = "DELETING"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AGA doesn't have a service-side construct of deleting - once it's deleted, it's gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for displaying on CRD while its being deleted since it may take some time for accelerator to be disabled. So till that time, the CRD resource will be in deleting status.
| } | ||
|
|
||
| // Set status to "Deleting" to indicate it's in the process of being deleted | ||
| if ga.Status.Status == nil || *ga.Status.Status != StatusDeleting { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the intent here : if status isn't already 'Deleting', then set it to 'Deleting ?
API seems like does not have 'Delete' status. Should the CRD match API ?
Status -> (string)
Describes the deployment status of the accelerator.
Possible values:
DEPLOYED
IN_PROGRESS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specifically want to show deleting status on our resource instead of In_progess because its much more clear. Deletion through the controller is two step process(Disable and Delete which can take some time) which we handle for them. And disabling can take upto 5 minutes. So they need to differentiate between create/updates/delete calls.
| } | ||
|
|
||
| // Update dual stack DNS name if available | ||
| if accelerator.Status.DualStackDNSName != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a lot of places. We are just setting status. For example, set DualStackDNSName here. Do we also need to clear DualStackDNSName in some case ?
I checked how ingress update status. They are replace entire arrays of status, which automatically clear .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Yes we need to clear this.
| r.logger.Error(updateErr, "Failed to update status during accelerator deletion") | ||
| } | ||
| // Requeue after 30 seconds to check again | ||
| return ctrlerrors.NewRequeueNeededAfter("Waiting for accelerator to be disabled", 30*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a pattern for exponential backoff available? Some kind of ramp up would be good if we have a pattern for it already because most updates are fully applied within a minute or two, but sometimes changes get delayed for longer. Some kind of exponential or planned backoff would provide a better (faster) experience when changes are flowing at the usual rate.
| acceleratorManager := r.stackDeployer.GetAcceleratorManager() | ||
| r.logger.Info("Deleting accelerator", "acceleratorARN", acceleratorARN, "globalAccelerator", k8s.NamespacedName(ga)) | ||
|
|
||
| // Create accelerator with ARN to delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) A different phrasing than "create accelerator" might be good here, since we aren't creating an accelerator in the AWS-resource sense.
|
|
||
| // Step 1: Try to disable the accelerator first if it's enabled | ||
| m.logger.Info("Disabling accelerator before deletion", "acceleratorARN", acceleratorARN) | ||
| isAlreadyDeleted, err := m.disableAccelerator(ctx, acceleratorARN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid a second describe call in disableAccelerator if we pass in the sdkAccelerator here? (or is it stale and in need of refreshing?)
| // Status constants | ||
| StatusDeployed = "DEPLOYED" | ||
| StatusInProgress = "IN_PROGRESS" | ||
| StatusDeleting = "Deleting" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AGA doesn't have a service-side construct of deleting - once it's deleted, it's gone.
Description
This PR introduces support for AGA deployer allowing Kubernetes users to manage Global Accelerator resources through Kubernetes custom resources. I have divided this PR into smaller commits to make it easier to review.
Here is the brief description on each of them
Setup AGA SDK Client (a39933f)
Adds AWS Global Accelerator SDK integration with the controller. Implements client interfaces, mock implementations, and cloud provider extensions to interact with the Global Accelerator service. Updates dependencies to include required AWS SDK packages for Global Accelerator.
Add AGA CRD Status Updates Utils (c34f2eb)
Implements status updating utilities for Global Accelerator CRDs. Provides mechanisms to reflect the state of AWS Global Accelerator resources back to Kubernetes custom resources, enabling users to track resource status, lifecycle events, and error conditions directly in Kubernetes.
Add AGA Tags Reconciler (9484b80)
Implements tagging functionality for Global Accelerator resources to maintain proper resource ownership and tracking. Provides interfaces and implementations for applying, updating, and reconciling tags on AWS Global Accelerator resources with comprehensive test coverage. Please note I have moved tracking tags to deployer instead of builders. Also I have added one more region specific tags so that multiple controller in same account in different region do not override each other
Add AGA Controller Config Flags (9695137)
I have disabled AGAController for now until its ready for GA. We will make this enabled by default when we are ready to release this for prod. We have also added exponential back-off duration flags for AGA controller which customer can configure. I wanna also know if we should make requeue time configurable as well.
Add AGA Deployer (47a7bec)
Implements the core deployment logic for Global Accelerator resources. This includes:
update controller-gen version(8dfe522)
Updating the controller-gen version to use v0.19.0
To fix the failing CRD verifications in CI/CD
TODO: I will add e2e tests once we build all the resources separately as this was already getting very big PR.
TODO: I have not added IAM policies. I wanna add when we are done with the upcoming release.
Checklist
README.md, or thedocsdirectory)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯