OCPBUGS-72387: Update the downloads manifest #1101

jhadvig · 2026-01-23T13:32:42Z

Follow up of #1099

Problem

Cluster upgrades (4.20→4.21) fail with node drain timeouts:
PDB blocks eviction of unhealthy pods: Cannot evict pod as it would violate the pod's disruption budget
Downloads pod eviction times out: global timeout reached: 1m30s

Root Causes:

PDB blocking: maxUnavailable: 1 prevents eviction of unhealthy/stuck pods (fixed in OCPBUGS-72387: Fix PDB blocking node drains during cluster upgrades #1099)
Slow termination: terminationGracePeriodSeconds: 0 causes SIGTERM/SIGKILL race, and sys.exit(0) attempts slow cleanup of 100 HTTP threads
Binary archives are created in series which is blocking start of the server

Downloads Deployment Fix:

Prevents SIGTERM/SIGKILL race condition
Forces instant termination of all 100 worker threads
Pod exits in <1s, well within eviction timeout
Create binary archives asynchronously

/assign @TheRealJon @cajieh

openshift-ci-robot · 2026-01-23T13:32:50Z

@jhadvig: This pull request references Jira Issue OCPBUGS-72387, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @yapei

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Follow up of #1099

Problem

Cluster upgrades (4.20→4.21) fail with node drain timeouts:

PDB blocks eviction of unhealthy pods: Cannot evict pod as it would violate the pod's disruption budget
Downloads pod eviction times out: global timeout reached: 1m30s

Root Causes:

PDB blocking: maxUnavailable: 1 prevents eviction of unhealthy/stuck pods (fixed in OCPBUGS-72387: Fix PDB blocking node drains during cluster upgrades #1099)

Slow termination: terminationGracePeriodSeconds: 0 causes SIGTERM/SIGKILL race, and sys.exit(0) attempts slow cleanup of 100 HTTP threads

Downloads Deployment Fix:

Prevents SIGTERM/SIGKILL race condition

Forces instant termination of all 100 worker threads

Pod exits in <1s, well within eviction timeout

/assign @TheRealJon @cajieh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

cajieh · 2026-01-23T14:28:28Z

Looks good 👍, but I will defer to @TheRealJon for a second pair of eyes.

jhadvig · 2026-01-26T12:52:09Z

/retest

openshift-ci · 2026-01-28T03:02:28Z

@jhadvig: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

yanpzhan · 2026-01-28T09:18:54Z

Successfully upgraded 4.21 ibmcloud cluster to image built from the pr.
/verified by yanpzhan

openshift-ci-robot · 2026-01-28T09:19:06Z

@yanpzhan: This PR has been marked as verified by yanpzhan.

Details

In response to this:

Successfully upgraded 4.21 ibmcloud cluster to image built from the pr.
/verified by yanpzhan

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-28T13:56:14Z

@jhadvig: This pull request references Jira Issue OCPBUGS-72387, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @yapei

Details

In response to this:

Follow up of #1099

Problem

Cluster upgrades (4.20→4.21) fail with node drain timeouts:

PDB blocks eviction of unhealthy pods: Cannot evict pod as it would violate the pod's disruption budget
Downloads pod eviction times out: global timeout reached: 1m30s

Root Causes:

PDB blocking: maxUnavailable: 1 prevents eviction of unhealthy/stuck pods (fixed in OCPBUGS-72387: Fix PDB blocking node drains during cluster upgrades #1099)

Slow termination: terminationGracePeriodSeconds: 0 causes SIGTERM/SIGKILL race, and sys.exit(0) attempts slow cleanup of 100 HTTP threads

Binary archives are created in series which is blocking start of the server

Downloads Deployment Fix:

Prevents SIGTERM/SIGKILL race condition

Forces instant termination of all 100 worker threads

Pod exits in <1s, well within eviction timeout

Create binary archives asynchronously

/assign @TheRealJon @cajieh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jhadvig · 2026-01-28T13:57:22Z

Adding the verify label back since I've need to remove unnecessary logs.
@TheRealJon PTAL

jhadvig · 2026-01-28T14:26:02Z

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-upgrade-fips
/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-upgrade-fips-nat-instance

openshift-ci · 2026-01-28T14:26:13Z

@jhadvig: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-upgrade-fips

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/468d9930-fc55-11f0-9f54-8d960f04dd9d-0

TheRealJon

I'm no python expert, but the addition of async archive building seems like a solid idea to help make pods ready up faster!
/lgtm

openshift-ci · 2026-01-28T16:26:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhadvig, TheRealJon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [TheRealJon,jhadvig]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-01-28T18:16:35Z

@jhadvig: Jira Issue OCPBUGS-72387: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-72387 has been moved to the MODIFIED state.

Details

In response to this:

Follow up of #1099

Problem

Cluster upgrades (4.20→4.21) fail with node drain timeouts:

PDB blocks eviction of unhealthy pods: Cannot evict pod as it would violate the pod's disruption budget
Downloads pod eviction times out: global timeout reached: 1m30s

Root Causes:

PDB blocking: maxUnavailable: 1 prevents eviction of unhealthy/stuck pods (fixed in OCPBUGS-72387: Fix PDB blocking node drains during cluster upgrades #1099)

Slow termination: terminationGracePeriodSeconds: 0 causes SIGTERM/SIGKILL race, and sys.exit(0) attempts slow cleanup of 100 HTTP threads

Binary archives are created in series which is blocking start of the server

Downloads Deployment Fix:

Prevents SIGTERM/SIGKILL race condition

Forces instant termination of all 100 worker threads

Pod exits in <1s, well within eviction timeout

Create binary archives asynchronously

/assign @TheRealJon @cajieh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci bot assigned cajieh and TheRealJon Jan 23, 2026

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jan 23, 2026

openshift-ci bot requested review from TheRealJon, spadgett and yapei January 23, 2026 13:32

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2026

OCPBUGS-72387: Update the downloads manifest

39b242a

jhadvig force-pushed the OCPBUGS-72387 branch from 6d4442b to 39b242a Compare January 23, 2026 14:06

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jan 28, 2026

OCPBUGS-72387: Create binary archives asynchronously

68417b2

jhadvig force-pushed the OCPBUGS-72387 branch from fdaabca to 68417b2 Compare January 28, 2026 13:54

openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jan 28, 2026

jhadvig added the verified Signifies that the PR passed pre-merge verification criteria label Jan 28, 2026

TheRealJon approved these changes Jan 28, 2026

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 28, 2026

openshift-merge-bot bot merged commit 0ad3482 into openshift:main Jan 28, 2026
10 checks passed

OCPBUGS-72387: Update the downloads manifest #1101

OCPBUGS-72387: Update the downloads manifest #1101

Conversation

jhadvig commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Jan 23, 2026

Uh oh!

cajieh commented Jan 23, 2026

Uh oh!

jhadvig commented Jan 26, 2026

Uh oh!

openshift-ci bot commented Jan 28, 2026

Uh oh!

yanpzhan commented Jan 28, 2026

Uh oh!

openshift-ci-robot commented Jan 28, 2026

Uh oh!

openshift-ci-robot commented Jan 28, 2026

Uh oh!

jhadvig commented Jan 28, 2026

Uh oh!

jhadvig commented Jan 28, 2026

Uh oh!

openshift-ci bot commented Jan 28, 2026

Uh oh!

TheRealJon left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jan 28, 2026

Uh oh!

Uh oh!

openshift-ci-robot commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jhadvig commented Jan 23, 2026 •

edited

Loading