OCPBUGS-62626: Auto fix Skip machine scaling test on two-node clusters with 0 worker replicas#31002
OCPBUGS-62626: Auto fix Skip machine scaling test on two-node clusters with 0 worker replicas#31002ropatil010 wants to merge 1 commit intoopenshift:mainfrom
Conversation
This fixes a test failure on two-node/compact clusters where worker machineSets exist but have 0 replicas. Problem: The test "[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously" was failing on two-node baremetal clusters with: Error: not all machines have a node reference: map[] Root Cause: On two-node clusters, worker machineSets may have spec.replicas=0 because workers are handled differently (masters also serve as workers). When the test attempted to scale from 0→1: 1. A new machine was created 2. The machine never got a nodeRef due to limited hardware resources 3. getNodesFromMachineSet() failed waiting for the nodeRef Solution: Add a pre-flight check to skip the test if any worker machineSet has 0 replicas. This check is performed BEFORE any scaling operations, preventing the failure. The fix also moves the Machine API and machineSet checks to the beginning of the test for better early validation. Tested on: - Two-node cluster with 0 worker machineSets: SKIP (expected) - Two-node cluster with worker machineSet having 0 replicas: SKIP (expected) Related: OCPBUGS-62626
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@ropatil010: This pull request references Jira Issue OCPBUGS-62626, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughReorders and extends preconditions in a machine scaling test by moving operator verification and machineSet fetching earlier in the test body. Adds a conditional skip when any worker MachineSet reports zero replicas, indicating a two-node cluster scenario. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ropatil010 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/assign @atiratree |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ovn-two-node-fencing-serial-techpreview-3of3 |
|
@ropatil010: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e8983490-3726-11f1-848f-a4914750a001-0 |
|
Scheduling required tests: |
|
/cc @mrda |
|
@ropatil010: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: a7aedfd
|
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ovn-two-node-fencing-serial-techpreview-3of3 |
|
@ropatil010: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2f20f770-375a-11f1-861d-2e5141bd6f7b-0 |
|
/retest |
Hi Team,
This fixes a test failure on two-node/compact clusters where worker machineSets exist but have 0 replicas.
Problem:
The test "[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously" was failing on two-node baremetal clusters with:
Error: not all machines have a node reference: map[]
Failure logs:
Root Cause:
On two-node clusters, worker machineSets may have spec.replicas=0 because workers are handled differently (masters also serve as workers). When the test attempted to scale from 0→1:
Solution:
Add a pre-flight check to skip the test if any worker machineSet has 0 replicas. This check is performed BEFORE any scaling operations, preventing the failure.
The fix also moves the Machine API and machineSet checks to the beginning of the test for better early validation.
Tested on: