OCPBUGS-78498: Update skew error message with doc links#5825
OCPBUGS-78498: Update skew error message with doc links#5825openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
|
@djoshy: This pull request references Jira Issue OCPBUGS-78498, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughUpdated a PrometheusRule alert link and replaced a placeholder docs URL in operator status logic by deriving an OCP docs version from ClusterVersion history; added a helper for current OCP version and adjusted install-version selection semantics. Tests were updated to exercise ClusterVersion-driven behavior and expected messages. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/jira refresh /test unit |
|
@djoshy: This pull request references Jira Issue OCPBUGS-78498, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/operator/sync.go (1)
2553-2586:⚠️ Potential issue | 🟠 MajorMigrate existing automatic status to the install-version baseline.
Line 2553 changes the source to the install version, but this branch still only initializes
MachineConfiguration.Status.BootImageSkewEnforcementStatuswhen automatic mode flips or the status is empty. Clusters already inAutomaticwill keep the pre-changeAutomatic.OCPVersionalready persisted in status, so the new install-version semantics never take effect there. If you add the migration here, updateAutomatic.OCPVersionin place—pkg/apihelpers/apihelpers.go:584-592builds a fresh automatic struct and would drop any populatedRHCOSVersion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/sync.go` around lines 2553 - 2586, The code switches the baseline to the install OCP version but only replaces the automatic status when flipping modes or when status is empty, which leaves existing Automatic entries with their old OCPVersion; to migrate in-place, detect the automated-path branch (when supportsBootImageUpdates && apihelpers.HasMAPIMachineSetManagerWithMode(...) is true) and if newMachineConfigurationStatus.BootImageSkewEnforcementStatus.Mode == opv1.BootImageSkewEnforcementModeStatusAutomatic (or mcop.Status.BootImageSkewEnforcementStatus.Mode is Automatic), set the Automatic.OCPVersion field to the value returned by optr.getOCPInstallVersionFromClusterVersion() on the existing newMachineConfigurationStatus.BootImageSkewEnforcementStatus.Automatic struct rather than calling apihelpers.GetSkewEnforcementStatusAutomaticWithOCPVersion (which builds a fresh struct and would drop any populated RHCOSVersion); only fall back to replacing the status with apihelpers.GetSkewEnforcementStatusAutomaticWithOCPVersion(...) when the status is empty or not present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/operator/sync.go`:
- Around line 2553-2586: The code switches the baseline to the install OCP
version but only replaces the automatic status when flipping modes or when
status is empty, which leaves existing Automatic entries with their old
OCPVersion; to migrate in-place, detect the automated-path branch (when
supportsBootImageUpdates && apihelpers.HasMAPIMachineSetManagerWithMode(...) is
true) and if newMachineConfigurationStatus.BootImageSkewEnforcementStatus.Mode
== opv1.BootImageSkewEnforcementModeStatusAutomatic (or
mcop.Status.BootImageSkewEnforcementStatus.Mode is Automatic), set the
Automatic.OCPVersion field to the value returned by
optr.getOCPInstallVersionFromClusterVersion() on the existing
newMachineConfigurationStatus.BootImageSkewEnforcementStatus.Automatic struct
rather than calling apihelpers.GetSkewEnforcementStatusAutomaticWithOCPVersion
(which builds a fresh struct and would drop any populated RHCOSVersion); only
fall back to replacing the status with
apihelpers.GetSkewEnforcementStatusAutomaticWithOCPVersion(...) when the status
is empty or not present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 639abbdd-7da8-454b-a2c7-91afa5b016cc
📒 Files selected for processing (3)
install/0000_90_machine-config_01_prometheus-rules.yamlpkg/operator/status.gopkg/operator/sync.go
|
@djoshy: This pull request references Jira Issue OCPBUGS-78498, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note that the link is an estimate at this point, based on previews from openshift/openshift-docs#108103. This PR is just to verify the injection mechanism works as intended. The link may be tweaked prior to merge or in a later update. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/operator/sync.go (1)
2622-2626:⚠️ Potential issue | 🟠 MajorReturn unknown install versions as empty, not
0.0.0.Line 2625 turns a parse failure into a synthetic version that will always look out of skew. That can disable upgrades on malformed
ClusterVersionhistory instead of treating the install version as unknown.Suggested fix
parsedVersion, err := k8sversion.ParseGeneric(installVersion) if err != nil { - klog.Warningf("Failed to parse install version %q: %v, use a placeholder for now", installVersion, err) - return "0.0.0" + klog.Warningf("Failed to parse install version %q: %v", installVersion, err) + return "" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/sync.go` around lines 2622 - 2626, The code in the block using k8sversion.ParseGeneric(installVersion) currently returns the synthetic "0.0.0" on parse failure; change this to return an empty string to represent an unknown install version instead. In the error branch where err != nil (around parsedVersion, err := k8sversion.ParseGeneric(installVersion)), update the return value from "0.0.0" to "" and keep the klog.Warningf call (optionally augmenting the message to state the install version is unknown), so callers receiving the install version can treat it as unknown rather than a concrete out-of-skew version.
🧹 Nitpick comments (1)
pkg/operator/status_test.go (1)
1282-1293: Add coverage for the"latest"docs fallback.The new harness can inject
ClusterVersion, but none of the blocked-upgrade cases leave it unset or unparsable. That means thecheckBootImageSkewUpgradeableGuard()fallback to.../latest/...is still untested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/status_test.go` around lines 1282 - 1293, The tests never cover the fallback path in checkBootImageSkewUpgradeableGuard that uses the ".../latest/..." docs URL because the harness always injects a valid ClusterVersion; update the test cases in status_test.go to include at least one scenario where the ClusterVersion is nil or unparsable so clusterVersionLister contains no usable ClusterVersion and triggers the fallback. Specifically, modify the table-driven cases that construct clusterVersionIndexer/clusterVersionLister (used when creating the Operator with field clusterVersionLister) to leave tc.clusterVersion nil or provide an invalid ClusterVersion object for one blocked-upgrade case, and add assertions expecting the docs URL to contain "/latest/" to verify the fallback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/operator/sync.go`:
- Around line 2622-2626: The code in the block using
k8sversion.ParseGeneric(installVersion) currently returns the synthetic "0.0.0"
on parse failure; change this to return an empty string to represent an unknown
install version instead. In the error branch where err != nil (around
parsedVersion, err := k8sversion.ParseGeneric(installVersion)), update the
return value from "0.0.0" to "" and keep the klog.Warningf call (optionally
augmenting the message to state the install version is unknown), so callers
receiving the install version can treat it as unknown rather than a concrete
out-of-skew version.
---
Nitpick comments:
In `@pkg/operator/status_test.go`:
- Around line 1282-1293: The tests never cover the fallback path in
checkBootImageSkewUpgradeableGuard that uses the ".../latest/..." docs URL
because the harness always injects a valid ClusterVersion; update the test cases
in status_test.go to include at least one scenario where the ClusterVersion is
nil or unparsable so clusterVersionLister contains no usable ClusterVersion and
triggers the fallback. Specifically, modify the table-driven cases that
construct clusterVersionIndexer/clusterVersionLister (used when creating the
Operator with field clusterVersionLister) to leave tc.clusterVersion nil or
provide an invalid ClusterVersion object for one blocked-upgrade case, and add
assertions expecting the docs URL to contain "/latest/" to verify the fallback
behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 15d309bb-8eaa-4269-a2a4-5a9970e7868d
📒 Files selected for processing (4)
install/0000_90_machine-config_01_prometheus-rules.yamlpkg/operator/status.gopkg/operator/status_test.gopkg/operator/sync.go
✅ Files skipped from review due to trivial changes (1)
- install/0000_90_machine-config_01_prometheus-rules.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/status.go
| annotations: | ||
| summary: "Boot image skew enforcement is disabled. Scaling operations may not be successful." | ||
| description: "Boot image skew enforcement mode is set to None. When scaling up, new nodes may be provisioned with older boot images that could introduce compatibility issues. Consider manually updating boot images to match the cluster version. Please refer to docs at [TODO-INSERTLINK] for additional details." | ||
| description: "Boot image skew enforcement mode is set to None. When scaling up, new nodes may be provisioned with older boot images that could introduce compatibility issues. Consider manually updating boot images to match the cluster version. Please refer to docs at https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/machine_configuration/mco-update-boot-skew-mgmt for additional details." |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, isabella-janssen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I manually tested this on an AWS cluster to verify:
|
|
/retest-required |
|
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-3of3 |
|
@sergiordlr: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info. |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-3of3 |
|
@djoshy: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f411b7c0-2ce9-11f1-80a3-79af1b92e756-0 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive |
|
@djoshy: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/06561430-2cea-11f1-86c3-dcea378f80e4-0 |
|
/test all |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 |
|
@djoshy: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c8eb5a70-2d10-11f1-8d5e-f82fc8292aeb-0 |
|
/retest-required |
|
/verified by payloads Chatted with @sergiordlr , we think existing tests and new units are enough to verify this behavior. |
|
@djoshy: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
39ca983
into
openshift:main
|
@djoshy: Jira Issue Verification Checks: Jira Issue OCPBUGS-78498 Jira Issue OCPBUGS-78498 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.22.0-0.nightly-2026-04-01-092906 |
Replaces the [TODO: insert link] placeholder in the
Upgradeableguard message with a real docs URL that dynamically reflects the cluster's current running version (e.g. 4.21). Falls back to latest if the version can't be determined. Tests and helpers were updated to account for this. The Prometheus alert only uses the latest link as the MCO cannot dynamically inject into the Prometheus manifest.