-
Notifications
You must be signed in to change notification settings - Fork 104
feat(images): add expired image upgrade logic to nodeclass status controller #1169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
charliedmcb
wants to merge
20
commits into
main
Choose a base branch
from
charliedmcb/addExpiredImageUpgradeLogic
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add 90-day expiration logic for Linux images to comply with AKS requirements. Images older than 90 days are automatically upgraded to the latest available version. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Separate version parsing logic into dedicated parseLinuxImage function that returns year, month, day, patch, and error for better testability and maintainability. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…n check Pass context to upgradeAnyExpiredImages function and use logging.ImageID constant for consistent logging across the codebase. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Expire images 30 minutes before the actual 90-day limit to provide a buffer and ensure we don't accidentally try to provision with an image that expires during the provisioning process. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…s TODO Generalize function name while maintaining Linux-only implementation. Add clear TODO comment indicating where Windows image version format (OS.PATCH.YYMMDD) support should be added. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Add OverrideNodeImageVersions field and Reset method to NodeImageVersionsAPI for isolated test data. Expose NodeImageVersionsAPI in test Environment and integrate Reset into Environment.Reset(). Makes nodeImageVersions private to prevent accidental modification. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Add test coverage for expired image upgrade functionality including: - Runtime-generated image versions with mixed ages (expired, recent, malformed) - SIG configuration with proper subscription ID format - Exact image comparison validation for upgrade scenarios - Test isolation using override mechanism in fake API 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Remove the expired image upgrade functionality to isolate it on a separate branch. This reverts the images controller and tests back to the main branch state while keeping the fake API improvements on the current branch. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Add Error field to NodeImageVersionsAPI to enable testing of error scenarios. The Error field takes precedence over normal data when set, and is cleared during Reset() calls for proper test isolation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
3 tasks
…ller" This reverts commit 4d3e6fa.
Base automatically changed from
charliedmcb/extendListNodeImagesFake
to
main
September 19, 2025 22:43
Add isLinuxImageVersion function to detect Linux image format (YYYYMM.DD.PATCH) using regex validation. Update isImageExpired to return early with "not expired" for non-Linux images, avoiding parsing errors for Windows or other image formats. This ensures only Linux images are processed for 90-day expiration checking while gracefully handling other image formats. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Restructure isImageExpired to pre-declare year/month/day variables and use conditional logic for Linux vs Windows image format detection. Return proper errors for unsupported formats instead of silently treating them as non-expired. This maintains error handling for malformed images while providing a clear structure for adding Windows image support in the future. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Improve isLinuxImageVersion to use regex capture groups and return parsed components directly. Simplify isImageExpired to use direct assignment and early return pattern. Remove unused parseLinuxImage function. This eliminates redundant parsing and makes the code more efficient and readable while maintaining the same functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
charliedmcb
commented
Sep 23, 2025
| goalImages = overrideAnyGoalStateVersionsWithExisting(nodeClass, goalImages) | ||
| updatedImages := overrideAnyGoalStateVersionsWithExisting(nodeClass, goalImages) | ||
| // Scenario C: Upgrade any images that are more than 90 days old (AKS compliance requirement) | ||
| goalImages = upgradeAnyExpiredImages(ctx, updatedImages, goalImages) |
Contributor
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of expiring the images, we could set the images not ready with reason ImagesExpired to prevent provisioning, and only bump them out of this particular unready state when a K8s upgrade occurs, or a maintenance window opens.
Something like:
nodeClass.StatusConditions().SetFalse(v1beta1.ConditionTypeImagesReady, "ImagesExpired", "Images are older than 90 days, and have expired. Must open a maintenance window or preform a Kubernetes upgrade")
Reset Azure-specific context options to defaults in BeforeEach to prevent SIG/CIG configuration from bleeding between tests. This fixes test failures that occurred when running full test suites where some tests configure SIG mode while others expect CIG mode. The issue was that nested BeforeEach blocks were setting UseSIG=true but the top-level BeforeEach wasn't resetting Azure options back to defaults, causing context pollution between test runs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #
Description
Add 90-day expiration logic for Linux images to comply with AKS requirements. Images older than 90 days are automatically upgraded to the latest available version.
See this comment for additional option on setting unreadiness, rather than forced upgrade:
#1169 (comment)
How was this change tested?
Note: did notice a new flaky unit test failure, but doesn't seem related to this change:

https://github.com/Azure/karpenter-provider-azure/actions/runs/17959552654/job/51079421795?pr=1169#step:6:3093
Does this change impact docs?
Release Note