Skip to content

[WIP] Launch GCP debug cluster with configurable wait time#76582

Open
XiyunZhao wants to merge 2 commits intoopenshift:mainfrom
XiyunZhao:launchcluster
Open

[WIP] Launch GCP debug cluster with configurable wait time#76582
XiyunZhao wants to merge 2 commits intoopenshift:mainfrom
XiyunZhao:launchcluster

Conversation

@XiyunZhao
Copy link
Contributor

@XiyunZhao XiyunZhao commented Mar 20, 2026

✨ Summary

Launch a GCP OpenShift cluster that stays alive for 12 hours for debugging. Automatically uses the latest 4.22 nightly build - no manual image version updates needed!

🔧 Changes

  • ✅ Auto-use latest 4.22 nightly build (no hardcoded version)
  • ✅ Cluster stays alive for 12 hours (or until manually destroyed)
  • ✅ Wait loop checks for stop signal every 30 seconds
  • ✅ Works with all GCP IPI jobs (13 different configurations available)

🚀 Quick Start

1. Trigger a debug cluster

Comment on this PR:

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm

2. Wait for cluster provisioning

  • Monitor PR checks for job status
  • Cluster will use latest 4.22 nightly build automatically
  • Once ready, job will wait for 12 hours

3. Access the cluster

Get kubeconfig from CI job artifacts at the Prow link in PR checks.

4. Debug your issue

Cluster stays alive for 12 hours.

5. Destroy when done

To terminate early:

oc create configmap stop-preserving -n default

📋 Available Environments

All these jobs now work with the debug mode:

Job Type Command
gcp-ipi-f14-stress-olm Standard (recommended) /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm
gcp-ipi-mini-perm-custom-type-f28 Custom machine type /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-mini-perm-custom-type-f28
gcp-ipi-marketplace-mini-perm-f28 Marketplace image /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-marketplace-mini-perm-f28
gcp-ipi-f7-longduration-mco-critical MCO critical tests /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f7-longduration-mco-critical
...and 9 more See full list below

⭐ = Recommended for general debugging

View all 13 available jobs
  1. olmv1-benchmark-test
  2. gcp-ipi-f14-stress-olm ⭐
  3. gcp-ipi-mini-perm-custom-type-f28
  4. gcp-ipi-marketplace-mini-perm-f28
  5. gcp-ipi-marketplace-mini-perm-f28-destructive
  6. gcp-ipi-f7-longduration-mco-critical
  7. gcp-ipi-f7-longduration-mco-p1
  8. gcp-ipi-f7-longduration-mco-p2
  9. gcp-ipi-f7-longduration-mco-p3
  10. gcp-ipi-longduration-tp-mco-p1-f7
  11. gcp-ipi-longduration-tp-mco-p2-f7
  12. gcp-ipi-longduration-tp-mco-p3-f7
  13. gcp-ipi-to-multiarch-mini-perm-f14

All use format: /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-<job-name>

🎯 Advanced: Use Specific Image Version

If you need a specific nightly build instead of latest, add a job override in the config file:

- as: gcp-ipi-f14-stress-olm-custom
  cron: 34 14 14,28 * *
  steps:
    cluster_profile: gcp-qe
    env:
      CUSTOM_OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE: registry.ci.openshift.org/ocp/release:4.22.0-0.nightly-2026-03-18-195154
    test:
    - chain: openshift-e2e-test-olm-qe-stress
    workflow: cucushift-installer-rehearse-gcp-ipi

Then rehearse the new job: /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm-custom

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2026
@openshift-ci openshift-ci bot requested review from Xia-Zhao-rh and gpei March 20, 2026 08:50
@XiyunZhao
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm

@openshift-ci-robot
Copy link
Contributor

@XiyunZhao: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@XiyunZhao
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm

@openshift-ci-robot
Copy link
Contributor

@XiyunZhao: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@XiyunZhao
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm

@openshift-ci-robot
Copy link
Contributor

@XiyunZhao: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@XiyunZhao: job(s): periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm either don't exist or were not found to be affected, and cannot be rehearsed

Allow specifying custom build version via environment variable.
Default empty value uses latest nightly build.

Usage in script:
- Edit PR to set specific version: CUSTOM_OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.ci.openshift.org/ocp/release:4.22.0-0.nightly-YYYY-MM-DD-HHMMSS
- Or leave empty for latest nightly

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add 12h wait loop checking for stop-preserving configmap
- Increase timeout from 8h to 12h
- Exits early when stop signal detected

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: XiyunZhao
Once this PR has been reviewed and has the lgtm label, please assign jianlinliu, xia-zhao-rh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@XiyunZhao: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-main-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-5.0-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.23-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.22-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.21-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.20-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.19-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.18-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.17-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.16-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.15-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-ibm-vpc-block-csi-driver-operator-release-4.14-e2e-ibmcloud-csi-extended openshift/ibm-vpc-block-csi-driver-operator presubmit Registry content changed
pull-ci-netobserv-netobserv-ebpf-agent-main-qe-e2e-tests netobserv/netobserv-ebpf-agent presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-main-debug-disasterrecovery-aws-ipi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-main-debug-disasterrecovery-baremetal-upi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-5.0-debug-disasterrecovery-aws-ipi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-5.0-debug-disasterrecovery-baremetal-upi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-4.23-debug-disasterrecovery-aws-ipi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-4.23-debug-disasterrecovery-baremetal-upi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-4.22-debug-disasterrecovery-aws-ipi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-4.22-debug-disasterrecovery-baremetal-upi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-4.21-debug-disasterrecovery-aws-ipi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-4.21-debug-disasterrecovery-baremetal-upi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-4.20-debug-disasterrecovery-aws-ipi openshift/openshift-tests-private presubmit Registry content changed
pull-ci-openshift-openshift-tests-private-release-4.20-debug-disasterrecovery-baremetal-upi openshift/openshift-tests-private presubmit Registry content changed

A total of 4372 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@XiyunZhao
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm

@openshift-ci-robot
Copy link
Contributor

@XiyunZhao: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@XiyunZhao
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm

@openshift-ci-robot
Copy link
Contributor

@XiyunZhao: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@XiyunZhao
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-gcp-ipi-f14-stress-olm

@openshift-ci-robot
Copy link
Contributor

@XiyunZhao: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

@XiyunZhao: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants