separate pvc retain/delete into a separate test case to control timeline better.#1139
separate pvc retain/delete into a separate test case to control timeline better.#1139kannon92 wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kannon92 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-jobset canceled.
|
There was a problem hiding this comment.
Pull request overview
This PR refactors the PVC retention/reuse integration test for the JobSet controller into its own dedicated Ginkgo test case to better control test ordering and reduce flakiness around PVC lifecycle handling.
Changes:
- Removed the
ginkgo.Entry("jobset with VolumeClaimPolicies should create two PVC with Delete and Retain policies", ...)from the genericDescribeTable-driven controller tests. - Added a standalone
ginkgo.It("JobSet with VolumeClaimPolicies should create, retain, and reuse PVCs", ...)that explicitly creates a namespace, a first JobSet with Delete/Retain PVC policies, verifies completion, deletes and cleans up PVCs, and then creates a second JobSet that reuses the retained PVC. - Kept the existing suspended-JobSet-with-VolumeClaimPolicies test and the controller helper functions, while reusing them where appropriate (e.g.,
completeAllJobs,JobSetCompleted).
| ginkgo.By("manually deleting PVCs with JobSet OwnerReference") | ||
| var pvcList corev1.PersistentVolumeClaimList | ||
| gomega.Expect(k8sClient.List(ctx, &pvcList, client.InNamespace(namespace))).To(gomega.Succeed()) | ||
| for i := range pvcList.Items { | ||
| pvc := &pvcList.Items[i] | ||
| // Envtest doesn't support garbage collection. | ||
| if len(pvc.OwnerReferences) > 0 { | ||
| // Remove finalizers to allow deletion in envtest. | ||
| pvc.Finalizers = []string{} | ||
| gomega.Expect(k8sClient.Update(ctx, pvc)).To(gomega.Succeed()) | ||
| gomega.Expect(k8sClient.Delete(ctx, pvc)).To(gomega.Succeed()) | ||
| } |
There was a problem hiding this comment.
The manual PVC cleanup relies solely on len(pvc.OwnerReferences) > 0 to decide which PVCs to delete. Because the JobSet controller removes the owner reference from the retained PVC asynchronously when the JobSet is deleted, this loop can race with that reconciliation and end up deleting the PVC that is supposed to be retained if the controller has not yet cleared its owner reference. That makes the "retain" semantics under test non-deterministic and can cause flakes when the retained PVC is deleted before the later assertion. To avoid this, the cleanup should deterministically target only the PVCs that are expected to be deleted (for example by matching on the known test-volume name or retention policy) rather than using the presence of an owner reference as the discriminator.
| }, | ||
| }, | ||
| }), | ||
| ginkgo.Entry("suspended jobset with VolumeClaimPolicies should create PVCs", &testCase{ |
There was a problem hiding this comment.
Do we need to move suspend tests as well?
There was a problem hiding this comment.
I'll check today but I didn't see any flakes on that job.
|
/test pull-jobset-test-integration-main |
|
@kannon92: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/hold Looks to still be flaky. |
What type of PR is this?
/kind flake
What this PR does / why we need it:
Separate integration test with PVC to a separate test case as there is a lot of orchestration needed.
Moving to a separate test case makes it clear what we need to run in order and hopefully deflakes the test.
Which issue(s) this PR fixes:
Fixes #1133
Special notes for your reviewer:
Does this PR introduce a user-facing change?