Skip to content

[WIP] HIVE-2139: Reuse allocated buffers in the resource helper#2873

Closed
dlom wants to merge 2 commits intoopenshift:masterfrom
dlom:HIVE-2139-bufferpool
Closed

[WIP] HIVE-2139: Reuse allocated buffers in the resource helper#2873
dlom wants to merge 2 commits intoopenshift:masterfrom
dlom:HIVE-2139-bufferpool

Conversation

@dlom
Copy link
Contributor

@dlom dlom commented Mar 17, 2026

xref: HIVE-2139

This is the first of potentially multiple PRs addressing memory issues in the clustersync controller (and hive in general). This PR is split into two commits: The first adds a basic benchmark harness for the resource helper, focused mainly on driving the Apply and Patch methods. This pulls in envtest as a test dependency. The second commit is the actual optimization: reusing allocated buffers in the resource helper.

Claude wrote the benchmarks and the comparison script for me, so take them with a grain of salt

Preliminary benchmark results: -benchtime=3x -count=6

goos: linux
goarch: amd64
pkg: github.com/openshift/hive/pkg/resource
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
                                             │  before.txt   │              after.txt              │
                                             │    sec/op     │    sec/op     vs base               │
Apply/ConfigMap-12                              2.531m ± 26%   1.725m ± 22%  -31.85% (p=0.002 n=6)
Apply/Secret-12                                 3.327m ± 47%   1.934m ± 15%  -41.89% (p=0.002 n=6)
Apply/Deployment-12                             4.249m ± 16%   2.332m ± 13%  -45.10% (p=0.002 n=6)
Apply/ServiceAccount-12                         3.537m ± 25%   1.786m ±  6%  -49.52% (p=0.002 n=6)
ApplyMultipleResources/5Resources-12           12.340m ± 13%   8.327m ±  7%  -32.52% (p=0.002 n=6)
ApplyMultipleResources/10Resources-12           27.54m ± 18%   20.27m ± 83%        ~ (p=0.132 n=6)
ApplyMultipleResources/20Resources-12           53.59m ± 11%   35.22m ±  6%  -34.28% (p=0.002 n=6)
Patch/StrategicMerge-12                         5.245m ± 59%   5.516m ± 16%        ~ (p=0.180 n=6)
Patch/JSONPatch-12                              4.564m ± 17%   4.287m ± 12%   -6.06% (p=0.009 n=6)
Patch/MergePatch-12                             4.228m ± 14%   4.618m ± 32%        ~ (p=0.394 n=6)
ApplyRuntimeObject/ConfigMap-12                 2.923m ± 26%   2.067m ±  8%  -29.29% (p=0.002 n=6)
ApplyRuntimeObject/Secret-12                    5.418m ± 13%   4.396m ± 22%  -18.87% (p=0.002 n=6)
ApplyRuntimeObject/ServiceAccount-12            2.927m ± 14%   2.768m ± 17%        ~ (p=0.310 n=6)
ControllerReconcileSyncSet-12                    1.137 ± 21%    1.045 ± 20%        ~ (p=0.485 n=6)
ControllerReconcileClusterInstall-12           1181.8m ± 32%   974.8m ± 25%        ~ (p=0.310 n=6)
ControllerReconcileClusterPool/3Clusters-12     23.36m ± 25%   17.17m ± 13%  -26.48% (p=0.002 n=6)
ControllerReconcileClusterPool/5Clusters-12     42.82m ± 17%   29.61m ± 12%  -30.84% (p=0.002 n=6)
ControllerReconcileClusterPool/10Clusters-12    66.50m ± 24%   57.27m ± 12%  -13.88% (p=0.041 n=6)
ControllerReconcileIdentityProvider-12           1.044 ± 17%    1.031 ± 22%        ~ (p=0.937 n=6)
geomean                                         18.34m         14.11m        -23.05%

                                             │  before.txt  │             after.txt              │
                                             │     B/op     │     B/op      vs base              │
Apply/ConfigMap-12                             659.4Ki ± 1%   659.0Ki ± 1%       ~ (p=0.394 n=6)
Apply/Secret-12                                815.3Ki ± 2%   811.3Ki ± 1%       ~ (p=0.818 n=6)
Apply/Deployment-12                            1.065Mi ± 1%   1.063Mi ± 1%       ~ (p=0.818 n=6)
Apply/ServiceAccount-12                        812.9Ki ± 1%   809.2Ki ± 1%       ~ (p=0.699 n=6)
ApplyMultipleResources/5Resources-12           3.564Mi ± 1%   3.598Mi ± 1%  +0.94% (p=0.026 n=6)
ApplyMultipleResources/10Resources-12          7.137Mi ± 1%   7.116Mi ± 1%       ~ (p=0.394 n=6)
ApplyMultipleResources/20Resources-12          14.34Mi ± 1%   14.37Mi ± 1%       ~ (p=0.240 n=6)
Patch/StrategicMerge-12                        849.7Ki ± 2%   847.4Ki ± 3%       ~ (p=0.589 n=6)
Patch/JSONPatch-12                             865.9Ki ± 2%   867.9Ki ± 2%  +0.23% (p=0.041 n=6)
Patch/MergePatch-12                            845.4Ki ± 3%   849.0Ki ± 2%       ~ (p=0.240 n=6)
ApplyRuntimeObject/ConfigMap-12                652.2Ki ± 2%   645.8Ki ± 1%       ~ (p=0.699 n=6)
ApplyRuntimeObject/Secret-12                   820.4Ki ± 2%   828.9Ki ± 0%  +1.03% (p=0.017 n=6)
ApplyRuntimeObject/ServiceAccount-12           805.2Ki ± 0%   803.2Ki ± 1%  -0.25% (p=0.009 n=6)
ControllerReconcileSyncSet-12                  17.68Mi ± 0%   17.64Mi ± 0%  -0.21% (p=0.009 n=6)
ControllerReconcileClusterInstall-12           18.31Mi ± 0%   18.29Mi ± 0%       ~ (p=0.394 n=6)
ControllerReconcileClusterPool/3Clusters-12    4.655Mi ± 1%   4.641Mi ± 1%       ~ (p=0.589 n=6)
ControllerReconcileClusterPool/5Clusters-12    7.650Mi ± 0%   7.718Mi ± 1%  +0.88% (p=0.026 n=6)
ControllerReconcileClusterPool/10Clusters-12   15.36Mi ± 0%   15.40Mi ± 1%       ~ (p=0.589 n=6)
ControllerReconcileIdentityProvider-12         15.29Mi ± 0%   15.31Mi ± 0%       ~ (p=0.310 n=6)
geomean                                        2.636Mi        2.637Mi       +0.02%

                                             │ before.txt  │             after.txt             │
                                             │  allocs/op  │  allocs/op   vs base              │
Apply/ConfigMap-12                             4.763k ± 1%   4.753k ± 0%  -0.21% (p=0.026 n=6)
Apply/Secret-12                                6.475k ± 0%   6.474k ± 0%       ~ (p=0.779 n=6)
Apply/Deployment-12                            9.887k ± 0%   9.898k ± 0%       ~ (p=0.240 n=6)
Apply/ServiceAccount-12                        6.362k ± 0%   6.363k ± 0%       ~ (p=0.485 n=6)
ApplyMultipleResources/5Resources-12           28.72k ± 0%   28.73k ± 0%       ~ (p=0.974 n=6)
ApplyMultipleResources/10Resources-12          57.32k ± 0%   57.28k ± 0%  -0.06% (p=0.026 n=6)
ApplyMultipleResources/20Resources-12          116.3k ± 0%   116.3k ± 0%       ~ (p=0.420 n=6)
Patch/StrategicMerge-12                        6.774k ± 0%   6.769k ± 0%       ~ (p=0.286 n=6)
Patch/JSONPatch-12                             6.864k ± 0%   6.865k ± 0%       ~ (p=0.459 n=6)
Patch/MergePatch-12                            6.769k ± 0%   6.771k ± 0%       ~ (p=0.377 n=6)
ApplyRuntimeObject/ConfigMap-12                4.585k ± 0%   4.582k ± 0%       ~ (p=0.697 n=6)
ApplyRuntimeObject/Secret-12                   6.598k ± 0%   6.600k ± 0%       ~ (p=0.794 n=6)
ApplyRuntimeObject/ServiceAccount-12           6.260k ± 0%   6.255k ± 0%       ~ (p=0.069 n=6)
ControllerReconcileSyncSet-12                  83.28k ± 0%   82.46k ± 0%  -0.98% (p=0.002 n=6)
ControllerReconcileClusterInstall-12           87.45k ± 0%   87.16k ± 0%  -0.34% (p=0.009 n=6)
ControllerReconcileClusterPool/3Clusters-12    38.47k ± 0%   38.46k ± 0%       ~ (p=0.310 n=6)
ControllerReconcileClusterPool/5Clusters-12    64.05k ± 0%   64.08k ± 0%       ~ (p=0.180 n=6)
ControllerReconcileClusterPool/10Clusters-12   128.2k ± 0%   128.1k ± 0%       ~ (p=0.104 n=6)
ControllerReconcileIdentityProvider-12         63.14k ± 0%   63.21k ± 1%  +0.10% (p=0.041 n=6)
geomean                                        19.49k        19.48k       -0.08%

My opinion: 23% speedup from not allocating buffers (and not GCing them) is great, but the memory effects are negligible. Slight overhead on memory from the pool (+0.02%), and the actual allocation savings are a drop in the bucket compared to the rest of the allocations made (-0.08%).

/assign @2uasimojo

Summary by CodeRabbit

  • New Features

    • Added comprehensive benchmarking suite for Apply, Patch, and concurrent operations.
    • Introduced buffer pooling optimization to improve performance of resource application operations.
    • Added environment test infrastructure setup scripts and utilities for streamlined cluster testing.
  • Tests

    • Added benchmark tests for controller reconciliation workflows and resource operations.
  • Chores

    • Enhanced build system with environment test setup tooling and benchmark comparison utilities.
    • Updated dependencies to include Kubernetes testing infrastructure.

dlom added 2 commits March 16, 2026 19:06
Claude wrote the actual benchmarks and the script for comparing them.
Go claude! It's very specific to this benchmark I'm testing, so I'll
update it later to make it more generic

Assisted-by: Claude Sonnet 4.5 (Anthropic)
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 17, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 17, 2026

@dlom: This pull request references HIVE-2139 which is a valid jira issue.

Details

In response to this:

xref: HIVE-2139

This is the first of potentially multiple PRs addressing memory issues in the clustersync controller (and hive in general). This PR is split into two commits: The first adds a basic benchmark harness for the resource helper, focused mainly on driving the Apply and Patch methods. This pulls in envtest as a test dependency. The second commit is the actual optimization: reusing allocated buffers in the resource helper.

Preliminary benchmark results: -benchtime=3x -count=6

goos: linux
goarch: amd64
pkg: github.com/openshift/hive/pkg/resource
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
                                            │  before.txt   │              after.txt              │
                                            │    sec/op     │    sec/op     vs base               │
Apply/ConfigMap-12                              2.531m ± 26%   1.725m ± 22%  -31.85% (p=0.002 n=6)
Apply/Secret-12                                 3.327m ± 47%   1.934m ± 15%  -41.89% (p=0.002 n=6)
Apply/Deployment-12                             4.249m ± 16%   2.332m ± 13%  -45.10% (p=0.002 n=6)
Apply/ServiceAccount-12                         3.537m ± 25%   1.786m ±  6%  -49.52% (p=0.002 n=6)
ApplyMultipleResources/5Resources-12           12.340m ± 13%   8.327m ±  7%  -32.52% (p=0.002 n=6)
ApplyMultipleResources/10Resources-12           27.54m ± 18%   20.27m ± 83%        ~ (p=0.132 n=6)
ApplyMultipleResources/20Resources-12           53.59m ± 11%   35.22m ±  6%  -34.28% (p=0.002 n=6)
Patch/StrategicMerge-12                         5.245m ± 59%   5.516m ± 16%        ~ (p=0.180 n=6)
Patch/JSONPatch-12                              4.564m ± 17%   4.287m ± 12%   -6.06% (p=0.009 n=6)
Patch/MergePatch-12                             4.228m ± 14%   4.618m ± 32%        ~ (p=0.394 n=6)
ApplyRuntimeObject/ConfigMap-12                 2.923m ± 26%   2.067m ±  8%  -29.29% (p=0.002 n=6)
ApplyRuntimeObject/Secret-12                    5.418m ± 13%   4.396m ± 22%  -18.87% (p=0.002 n=6)
ApplyRuntimeObject/ServiceAccount-12            2.927m ± 14%   2.768m ± 17%        ~ (p=0.310 n=6)
ControllerReconcileSyncSet-12                    1.137 ± 21%    1.045 ± 20%        ~ (p=0.485 n=6)
ControllerReconcileClusterInstall-12           1181.8m ± 32%   974.8m ± 25%        ~ (p=0.310 n=6)
ControllerReconcileClusterPool/3Clusters-12     23.36m ± 25%   17.17m ± 13%  -26.48% (p=0.002 n=6)
ControllerReconcileClusterPool/5Clusters-12     42.82m ± 17%   29.61m ± 12%  -30.84% (p=0.002 n=6)
ControllerReconcileClusterPool/10Clusters-12    66.50m ± 24%   57.27m ± 12%  -13.88% (p=0.041 n=6)
ControllerReconcileIdentityProvider-12           1.044 ± 17%    1.031 ± 22%        ~ (p=0.937 n=6)
geomean                                         18.34m         14.11m        -23.05%

                                            │  before.txt  │             after.txt              │
                                            │     B/op     │     B/op      vs base              │
Apply/ConfigMap-12                             659.4Ki ± 1%   659.0Ki ± 1%       ~ (p=0.394 n=6)
Apply/Secret-12                                815.3Ki ± 2%   811.3Ki ± 1%       ~ (p=0.818 n=6)
Apply/Deployment-12                            1.065Mi ± 1%   1.063Mi ± 1%       ~ (p=0.818 n=6)
Apply/ServiceAccount-12                        812.9Ki ± 1%   809.2Ki ± 1%       ~ (p=0.699 n=6)
ApplyMultipleResources/5Resources-12           3.564Mi ± 1%   3.598Mi ± 1%  +0.94% (p=0.026 n=6)
ApplyMultipleResources/10Resources-12          7.137Mi ± 1%   7.116Mi ± 1%       ~ (p=0.394 n=6)
ApplyMultipleResources/20Resources-12          14.34Mi ± 1%   14.37Mi ± 1%       ~ (p=0.240 n=6)
Patch/StrategicMerge-12                        849.7Ki ± 2%   847.4Ki ± 3%       ~ (p=0.589 n=6)
Patch/JSONPatch-12                             865.9Ki ± 2%   867.9Ki ± 2%  +0.23% (p=0.041 n=6)
Patch/MergePatch-12                            845.4Ki ± 3%   849.0Ki ± 2%       ~ (p=0.240 n=6)
ApplyRuntimeObject/ConfigMap-12                652.2Ki ± 2%   645.8Ki ± 1%       ~ (p=0.699 n=6)
ApplyRuntimeObject/Secret-12                   820.4Ki ± 2%   828.9Ki ± 0%  +1.03% (p=0.017 n=6)
ApplyRuntimeObject/ServiceAccount-12           805.2Ki ± 0%   803.2Ki ± 1%  -0.25% (p=0.009 n=6)
ControllerReconcileSyncSet-12                  17.68Mi ± 0%   17.64Mi ± 0%  -0.21% (p=0.009 n=6)
ControllerReconcileClusterInstall-12           18.31Mi ± 0%   18.29Mi ± 0%       ~ (p=0.394 n=6)
ControllerReconcileClusterPool/3Clusters-12    4.655Mi ± 1%   4.641Mi ± 1%       ~ (p=0.589 n=6)
ControllerReconcileClusterPool/5Clusters-12    7.650Mi ± 0%   7.718Mi ± 1%  +0.88% (p=0.026 n=6)
ControllerReconcileClusterPool/10Clusters-12   15.36Mi ± 0%   15.40Mi ± 1%       ~ (p=0.589 n=6)
ControllerReconcileIdentityProvider-12         15.29Mi ± 0%   15.31Mi ± 0%       ~ (p=0.310 n=6)
geomean                                        2.636Mi        2.637Mi       +0.02%

                                            │ before.txt  │             after.txt             │
                                            │  allocs/op  │  allocs/op   vs base              │
Apply/ConfigMap-12                             4.763k ± 1%   4.753k ± 0%  -0.21% (p=0.026 n=6)
Apply/Secret-12                                6.475k ± 0%   6.474k ± 0%       ~ (p=0.779 n=6)
Apply/Deployment-12                            9.887k ± 0%   9.898k ± 0%       ~ (p=0.240 n=6)
Apply/ServiceAccount-12                        6.362k ± 0%   6.363k ± 0%       ~ (p=0.485 n=6)
ApplyMultipleResources/5Resources-12           28.72k ± 0%   28.73k ± 0%       ~ (p=0.974 n=6)
ApplyMultipleResources/10Resources-12          57.32k ± 0%   57.28k ± 0%  -0.06% (p=0.026 n=6)
ApplyMultipleResources/20Resources-12          116.3k ± 0%   116.3k ± 0%       ~ (p=0.420 n=6)
Patch/StrategicMerge-12                        6.774k ± 0%   6.769k ± 0%       ~ (p=0.286 n=6)
Patch/JSONPatch-12                             6.864k ± 0%   6.865k ± 0%       ~ (p=0.459 n=6)
Patch/MergePatch-12                            6.769k ± 0%   6.771k ± 0%       ~ (p=0.377 n=6)
ApplyRuntimeObject/ConfigMap-12                4.585k ± 0%   4.582k ± 0%       ~ (p=0.697 n=6)
ApplyRuntimeObject/Secret-12                   6.598k ± 0%   6.600k ± 0%       ~ (p=0.794 n=6)
ApplyRuntimeObject/ServiceAccount-12           6.260k ± 0%   6.255k ± 0%       ~ (p=0.069 n=6)
ControllerReconcileSyncSet-12                  83.28k ± 0%   82.46k ± 0%  -0.98% (p=0.002 n=6)
ControllerReconcileClusterInstall-12           87.45k ± 0%   87.16k ± 0%  -0.34% (p=0.009 n=6)
ControllerReconcileClusterPool/3Clusters-12    38.47k ± 0%   38.46k ± 0%       ~ (p=0.310 n=6)
ControllerReconcileClusterPool/5Clusters-12    64.05k ± 0%   64.08k ± 0%       ~ (p=0.180 n=6)
ControllerReconcileClusterPool/10Clusters-12   128.2k ± 0%   128.1k ± 0%       ~ (p=0.104 n=6)
ControllerReconcileIdentityProvider-12         63.14k ± 0%   63.21k ± 1%  +0.10% (p=0.041 n=6)
geomean                                        19.49k        19.48k       -0.08%

My opinion: 23% speedup from not allocating buffers (and not GCing them) is great, but the memory effects are negligible. Slight overhead on memory from the pool (+0.02%), and the actual allocation savings are a drop in the bucket compared to the rest of the allocations made (-0.08%).

/assign @2uasimojo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

Introduces environment test (envtest) setup automation and buffer pooling for resource efficiency. Adds setup-envtest tooling integration through build system changes, vendors substantial controller-runtime testing infrastructure, and implements comprehensive benchmark suites for Apply, Patch, and controller reconciliation operations.

Changes

Cohort / File(s) Summary
Build System & Tool Integration
Makefile, go.mod, pkg/dependencymagnet/doc.go
Adds setup-envtest target and build dependency; includes setup-envtest as blank import for initialization during builds.
Build Automation Scripts
hack/setup-envtest.sh, hack/benchmark-apply-comparison.sh
Introduces environment setup automation and benchmark comparison tooling; setup-envtest configures Kubernetes API binaries, benchmark script stashes and compares performance with/without buffer pooling.
Buffer Pool & Resource Management
pkg/resource/bufferpool.go, pkg/resource/apply.go, pkg/resource/patch.go, pkg/resource/serializer.go
Implements sync.Pool-based buffer pooling with getBuffer/returnBuffer and getIOStreams/returnIOStreams helpers; refactors Apply and Patch to use pooled resources; updates serializer comment.
Benchmark Test Suite
pkg/resource/apply_benchmark_test.go, pkg/resource/controller_sim_benchmark_test.go
Adds comprehensive benchmarks for Apply, Patch, ApplyRuntimeObject, ConcurrentApply, and controller reconciliation workflows (SyncSet, ClusterInstall, ClusterPool, IdentityProvider); includes envtest setup and Hive scheme integration.
Vendored Envtest Infrastructure
vendor/sigs.k8s.io/controller-runtime/pkg/envtest/*, vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/*, vendor/sigs.k8s.io/controller-runtime/pkg/internal/flock/*, vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/*, vendor/modules.txt
Vendors complete controller-runtime envtest ecosystem including environment initialization, API server/Etcd lifecycle, CRD/webhook installation, certificate generation, process management, argument handling, HTTP client for binary distribution, and version parsing utilities.

Sequence Diagram

sequenceDiagram
    participant Test as Test Runner
    participant Env as Environment
    participant ControlPlane as Control Plane
    participant APIServer as API Server
    participant Etcd as Etcd
    participant CRDMgr as CRD Manager
    participant WebhookMgr as Webhook Manager

    Test->>Env: Start()
    Env->>ControlPlane: Initialize
    ControlPlane->>Etcd: Start()
    activate Etcd
    Etcd-->>ControlPlane: Ready
    deactivate Etcd
    ControlPlane->>APIServer: Start()
    activate APIServer
    APIServer-->>ControlPlane: Ready
    deactivate APIServer
    Env->>Env: Wait for Default Namespace
    Env->>CRDMgr: InstallCRDs()
    activate CRDMgr
    CRDMgr->>APIServer: Create CRD Resources
    CRDMgr->>APIServer: Wait for Discovery
    CRDMgr-->>Env: Complete
    deactivate CRDMgr
    Env->>WebhookMgr: Install Webhooks
    activate WebhookMgr
    WebhookMgr->>APIServer: Setup Webhook Configs
    WebhookMgr->>APIServer: Verify Availability
    WebhookMgr-->>Env: Complete
    deactivate WebhookMgr
    Env-->>Test: REST Config Ready
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A pooling we shall go!
Buffers dance in sync and flow,
Setup-envtest guides the way,
Benchmarks test both night and day,
Infrastructure springs to life,
Cutting through the testing strife! 🏃‍♂️✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 83.78% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly summarizes the main optimization: reusing allocated buffers in the resource helper via buffer pooling, which is the core objective of this changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from 2uasimojo and suhanime March 17, 2026 02:21
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dlom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 17, 2026

@dlom: This pull request references HIVE-2139 which is a valid jira issue.

Details

In response to this:

xref: HIVE-2139

This is the first of potentially multiple PRs addressing memory issues in the clustersync controller (and hive in general). This PR is split into two commits: The first adds a basic benchmark harness for the resource helper, focused mainly on driving the Apply and Patch methods. This pulls in envtest as a test dependency. The second commit is the actual optimization: reusing allocated buffers in the resource helper.

Claude wrote the benchmarks and the comparison script for me, so take them with a grain of salt

Preliminary benchmark results: -benchtime=3x -count=6

goos: linux
goarch: amd64
pkg: github.com/openshift/hive/pkg/resource
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
                                            │  before.txt   │              after.txt              │
                                            │    sec/op     │    sec/op     vs base               │
Apply/ConfigMap-12                              2.531m ± 26%   1.725m ± 22%  -31.85% (p=0.002 n=6)
Apply/Secret-12                                 3.327m ± 47%   1.934m ± 15%  -41.89% (p=0.002 n=6)
Apply/Deployment-12                             4.249m ± 16%   2.332m ± 13%  -45.10% (p=0.002 n=6)
Apply/ServiceAccount-12                         3.537m ± 25%   1.786m ±  6%  -49.52% (p=0.002 n=6)
ApplyMultipleResources/5Resources-12           12.340m ± 13%   8.327m ±  7%  -32.52% (p=0.002 n=6)
ApplyMultipleResources/10Resources-12           27.54m ± 18%   20.27m ± 83%        ~ (p=0.132 n=6)
ApplyMultipleResources/20Resources-12           53.59m ± 11%   35.22m ±  6%  -34.28% (p=0.002 n=6)
Patch/StrategicMerge-12                         5.245m ± 59%   5.516m ± 16%        ~ (p=0.180 n=6)
Patch/JSONPatch-12                              4.564m ± 17%   4.287m ± 12%   -6.06% (p=0.009 n=6)
Patch/MergePatch-12                             4.228m ± 14%   4.618m ± 32%        ~ (p=0.394 n=6)
ApplyRuntimeObject/ConfigMap-12                 2.923m ± 26%   2.067m ±  8%  -29.29% (p=0.002 n=6)
ApplyRuntimeObject/Secret-12                    5.418m ± 13%   4.396m ± 22%  -18.87% (p=0.002 n=6)
ApplyRuntimeObject/ServiceAccount-12            2.927m ± 14%   2.768m ± 17%        ~ (p=0.310 n=6)
ControllerReconcileSyncSet-12                    1.137 ± 21%    1.045 ± 20%        ~ (p=0.485 n=6)
ControllerReconcileClusterInstall-12           1181.8m ± 32%   974.8m ± 25%        ~ (p=0.310 n=6)
ControllerReconcileClusterPool/3Clusters-12     23.36m ± 25%   17.17m ± 13%  -26.48% (p=0.002 n=6)
ControllerReconcileClusterPool/5Clusters-12     42.82m ± 17%   29.61m ± 12%  -30.84% (p=0.002 n=6)
ControllerReconcileClusterPool/10Clusters-12    66.50m ± 24%   57.27m ± 12%  -13.88% (p=0.041 n=6)
ControllerReconcileIdentityProvider-12           1.044 ± 17%    1.031 ± 22%        ~ (p=0.937 n=6)
geomean                                         18.34m         14.11m        -23.05%

                                            │  before.txt  │             after.txt              │
                                            │     B/op     │     B/op      vs base              │
Apply/ConfigMap-12                             659.4Ki ± 1%   659.0Ki ± 1%       ~ (p=0.394 n=6)
Apply/Secret-12                                815.3Ki ± 2%   811.3Ki ± 1%       ~ (p=0.818 n=6)
Apply/Deployment-12                            1.065Mi ± 1%   1.063Mi ± 1%       ~ (p=0.818 n=6)
Apply/ServiceAccount-12                        812.9Ki ± 1%   809.2Ki ± 1%       ~ (p=0.699 n=6)
ApplyMultipleResources/5Resources-12           3.564Mi ± 1%   3.598Mi ± 1%  +0.94% (p=0.026 n=6)
ApplyMultipleResources/10Resources-12          7.137Mi ± 1%   7.116Mi ± 1%       ~ (p=0.394 n=6)
ApplyMultipleResources/20Resources-12          14.34Mi ± 1%   14.37Mi ± 1%       ~ (p=0.240 n=6)
Patch/StrategicMerge-12                        849.7Ki ± 2%   847.4Ki ± 3%       ~ (p=0.589 n=6)
Patch/JSONPatch-12                             865.9Ki ± 2%   867.9Ki ± 2%  +0.23% (p=0.041 n=6)
Patch/MergePatch-12                            845.4Ki ± 3%   849.0Ki ± 2%       ~ (p=0.240 n=6)
ApplyRuntimeObject/ConfigMap-12                652.2Ki ± 2%   645.8Ki ± 1%       ~ (p=0.699 n=6)
ApplyRuntimeObject/Secret-12                   820.4Ki ± 2%   828.9Ki ± 0%  +1.03% (p=0.017 n=6)
ApplyRuntimeObject/ServiceAccount-12           805.2Ki ± 0%   803.2Ki ± 1%  -0.25% (p=0.009 n=6)
ControllerReconcileSyncSet-12                  17.68Mi ± 0%   17.64Mi ± 0%  -0.21% (p=0.009 n=6)
ControllerReconcileClusterInstall-12           18.31Mi ± 0%   18.29Mi ± 0%       ~ (p=0.394 n=6)
ControllerReconcileClusterPool/3Clusters-12    4.655Mi ± 1%   4.641Mi ± 1%       ~ (p=0.589 n=6)
ControllerReconcileClusterPool/5Clusters-12    7.650Mi ± 0%   7.718Mi ± 1%  +0.88% (p=0.026 n=6)
ControllerReconcileClusterPool/10Clusters-12   15.36Mi ± 0%   15.40Mi ± 1%       ~ (p=0.589 n=6)
ControllerReconcileIdentityProvider-12         15.29Mi ± 0%   15.31Mi ± 0%       ~ (p=0.310 n=6)
geomean                                        2.636Mi        2.637Mi       +0.02%

                                            │ before.txt  │             after.txt             │
                                            │  allocs/op  │  allocs/op   vs base              │
Apply/ConfigMap-12                             4.763k ± 1%   4.753k ± 0%  -0.21% (p=0.026 n=6)
Apply/Secret-12                                6.475k ± 0%   6.474k ± 0%       ~ (p=0.779 n=6)
Apply/Deployment-12                            9.887k ± 0%   9.898k ± 0%       ~ (p=0.240 n=6)
Apply/ServiceAccount-12                        6.362k ± 0%   6.363k ± 0%       ~ (p=0.485 n=6)
ApplyMultipleResources/5Resources-12           28.72k ± 0%   28.73k ± 0%       ~ (p=0.974 n=6)
ApplyMultipleResources/10Resources-12          57.32k ± 0%   57.28k ± 0%  -0.06% (p=0.026 n=6)
ApplyMultipleResources/20Resources-12          116.3k ± 0%   116.3k ± 0%       ~ (p=0.420 n=6)
Patch/StrategicMerge-12                        6.774k ± 0%   6.769k ± 0%       ~ (p=0.286 n=6)
Patch/JSONPatch-12                             6.864k ± 0%   6.865k ± 0%       ~ (p=0.459 n=6)
Patch/MergePatch-12                            6.769k ± 0%   6.771k ± 0%       ~ (p=0.377 n=6)
ApplyRuntimeObject/ConfigMap-12                4.585k ± 0%   4.582k ± 0%       ~ (p=0.697 n=6)
ApplyRuntimeObject/Secret-12                   6.598k ± 0%   6.600k ± 0%       ~ (p=0.794 n=6)
ApplyRuntimeObject/ServiceAccount-12           6.260k ± 0%   6.255k ± 0%       ~ (p=0.069 n=6)
ControllerReconcileSyncSet-12                  83.28k ± 0%   82.46k ± 0%  -0.98% (p=0.002 n=6)
ControllerReconcileClusterInstall-12           87.45k ± 0%   87.16k ± 0%  -0.34% (p=0.009 n=6)
ControllerReconcileClusterPool/3Clusters-12    38.47k ± 0%   38.46k ± 0%       ~ (p=0.310 n=6)
ControllerReconcileClusterPool/5Clusters-12    64.05k ± 0%   64.08k ± 0%       ~ (p=0.180 n=6)
ControllerReconcileClusterPool/10Clusters-12   128.2k ± 0%   128.1k ± 0%       ~ (p=0.104 n=6)
ControllerReconcileIdentityProvider-12         63.14k ± 0%   63.21k ± 1%  +0.10% (p=0.041 n=6)
geomean                                        19.49k        19.48k       -0.08%

My opinion: 23% speedup from not allocating buffers (and not GCing them) is great, but the memory effects are negligible. Slight overhead on memory from the pool (+0.02%), and the actual allocation savings are a drop in the bucket compared to the rest of the allocations made (-0.08%).

/assign @2uasimojo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dlom
Copy link
Contributor Author

dlom commented Mar 17, 2026

/retitle [WIP] HIVE-2139: Reuse allocated buffers in the resource helper

@openshift-ci openshift-ci bot changed the title HIVE-2139: Reuse allocated buffers in the resource helper [WIP] HIVE-2139: Reuse allocated buffers in the resource helper Mar 17, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 17, 2026

@dlom: This pull request references HIVE-2139 which is a valid jira issue.

Details

In response to this:

xref: HIVE-2139

This is the first of potentially multiple PRs addressing memory issues in the clustersync controller (and hive in general). This PR is split into two commits: The first adds a basic benchmark harness for the resource helper, focused mainly on driving the Apply and Patch methods. This pulls in envtest as a test dependency. The second commit is the actual optimization: reusing allocated buffers in the resource helper.

Claude wrote the benchmarks and the comparison script for me, so take them with a grain of salt

Preliminary benchmark results: -benchtime=3x -count=6

goos: linux
goarch: amd64
pkg: github.com/openshift/hive/pkg/resource
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
                                            │  before.txt   │              after.txt              │
                                            │    sec/op     │    sec/op     vs base               │
Apply/ConfigMap-12                              2.531m ± 26%   1.725m ± 22%  -31.85% (p=0.002 n=6)
Apply/Secret-12                                 3.327m ± 47%   1.934m ± 15%  -41.89% (p=0.002 n=6)
Apply/Deployment-12                             4.249m ± 16%   2.332m ± 13%  -45.10% (p=0.002 n=6)
Apply/ServiceAccount-12                         3.537m ± 25%   1.786m ±  6%  -49.52% (p=0.002 n=6)
ApplyMultipleResources/5Resources-12           12.340m ± 13%   8.327m ±  7%  -32.52% (p=0.002 n=6)
ApplyMultipleResources/10Resources-12           27.54m ± 18%   20.27m ± 83%        ~ (p=0.132 n=6)
ApplyMultipleResources/20Resources-12           53.59m ± 11%   35.22m ±  6%  -34.28% (p=0.002 n=6)
Patch/StrategicMerge-12                         5.245m ± 59%   5.516m ± 16%        ~ (p=0.180 n=6)
Patch/JSONPatch-12                              4.564m ± 17%   4.287m ± 12%   -6.06% (p=0.009 n=6)
Patch/MergePatch-12                             4.228m ± 14%   4.618m ± 32%        ~ (p=0.394 n=6)
ApplyRuntimeObject/ConfigMap-12                 2.923m ± 26%   2.067m ±  8%  -29.29% (p=0.002 n=6)
ApplyRuntimeObject/Secret-12                    5.418m ± 13%   4.396m ± 22%  -18.87% (p=0.002 n=6)
ApplyRuntimeObject/ServiceAccount-12            2.927m ± 14%   2.768m ± 17%        ~ (p=0.310 n=6)
ControllerReconcileSyncSet-12                    1.137 ± 21%    1.045 ± 20%        ~ (p=0.485 n=6)
ControllerReconcileClusterInstall-12           1181.8m ± 32%   974.8m ± 25%        ~ (p=0.310 n=6)
ControllerReconcileClusterPool/3Clusters-12     23.36m ± 25%   17.17m ± 13%  -26.48% (p=0.002 n=6)
ControllerReconcileClusterPool/5Clusters-12     42.82m ± 17%   29.61m ± 12%  -30.84% (p=0.002 n=6)
ControllerReconcileClusterPool/10Clusters-12    66.50m ± 24%   57.27m ± 12%  -13.88% (p=0.041 n=6)
ControllerReconcileIdentityProvider-12           1.044 ± 17%    1.031 ± 22%        ~ (p=0.937 n=6)
geomean                                         18.34m         14.11m        -23.05%

                                            │  before.txt  │             after.txt              │
                                            │     B/op     │     B/op      vs base              │
Apply/ConfigMap-12                             659.4Ki ± 1%   659.0Ki ± 1%       ~ (p=0.394 n=6)
Apply/Secret-12                                815.3Ki ± 2%   811.3Ki ± 1%       ~ (p=0.818 n=6)
Apply/Deployment-12                            1.065Mi ± 1%   1.063Mi ± 1%       ~ (p=0.818 n=6)
Apply/ServiceAccount-12                        812.9Ki ± 1%   809.2Ki ± 1%       ~ (p=0.699 n=6)
ApplyMultipleResources/5Resources-12           3.564Mi ± 1%   3.598Mi ± 1%  +0.94% (p=0.026 n=6)
ApplyMultipleResources/10Resources-12          7.137Mi ± 1%   7.116Mi ± 1%       ~ (p=0.394 n=6)
ApplyMultipleResources/20Resources-12          14.34Mi ± 1%   14.37Mi ± 1%       ~ (p=0.240 n=6)
Patch/StrategicMerge-12                        849.7Ki ± 2%   847.4Ki ± 3%       ~ (p=0.589 n=6)
Patch/JSONPatch-12                             865.9Ki ± 2%   867.9Ki ± 2%  +0.23% (p=0.041 n=6)
Patch/MergePatch-12                            845.4Ki ± 3%   849.0Ki ± 2%       ~ (p=0.240 n=6)
ApplyRuntimeObject/ConfigMap-12                652.2Ki ± 2%   645.8Ki ± 1%       ~ (p=0.699 n=6)
ApplyRuntimeObject/Secret-12                   820.4Ki ± 2%   828.9Ki ± 0%  +1.03% (p=0.017 n=6)
ApplyRuntimeObject/ServiceAccount-12           805.2Ki ± 0%   803.2Ki ± 1%  -0.25% (p=0.009 n=6)
ControllerReconcileSyncSet-12                  17.68Mi ± 0%   17.64Mi ± 0%  -0.21% (p=0.009 n=6)
ControllerReconcileClusterInstall-12           18.31Mi ± 0%   18.29Mi ± 0%       ~ (p=0.394 n=6)
ControllerReconcileClusterPool/3Clusters-12    4.655Mi ± 1%   4.641Mi ± 1%       ~ (p=0.589 n=6)
ControllerReconcileClusterPool/5Clusters-12    7.650Mi ± 0%   7.718Mi ± 1%  +0.88% (p=0.026 n=6)
ControllerReconcileClusterPool/10Clusters-12   15.36Mi ± 0%   15.40Mi ± 1%       ~ (p=0.589 n=6)
ControllerReconcileIdentityProvider-12         15.29Mi ± 0%   15.31Mi ± 0%       ~ (p=0.310 n=6)
geomean                                        2.636Mi        2.637Mi       +0.02%

                                            │ before.txt  │             after.txt             │
                                            │  allocs/op  │  allocs/op   vs base              │
Apply/ConfigMap-12                             4.763k ± 1%   4.753k ± 0%  -0.21% (p=0.026 n=6)
Apply/Secret-12                                6.475k ± 0%   6.474k ± 0%       ~ (p=0.779 n=6)
Apply/Deployment-12                            9.887k ± 0%   9.898k ± 0%       ~ (p=0.240 n=6)
Apply/ServiceAccount-12                        6.362k ± 0%   6.363k ± 0%       ~ (p=0.485 n=6)
ApplyMultipleResources/5Resources-12           28.72k ± 0%   28.73k ± 0%       ~ (p=0.974 n=6)
ApplyMultipleResources/10Resources-12          57.32k ± 0%   57.28k ± 0%  -0.06% (p=0.026 n=6)
ApplyMultipleResources/20Resources-12          116.3k ± 0%   116.3k ± 0%       ~ (p=0.420 n=6)
Patch/StrategicMerge-12                        6.774k ± 0%   6.769k ± 0%       ~ (p=0.286 n=6)
Patch/JSONPatch-12                             6.864k ± 0%   6.865k ± 0%       ~ (p=0.459 n=6)
Patch/MergePatch-12                            6.769k ± 0%   6.771k ± 0%       ~ (p=0.377 n=6)
ApplyRuntimeObject/ConfigMap-12                4.585k ± 0%   4.582k ± 0%       ~ (p=0.697 n=6)
ApplyRuntimeObject/Secret-12                   6.598k ± 0%   6.600k ± 0%       ~ (p=0.794 n=6)
ApplyRuntimeObject/ServiceAccount-12           6.260k ± 0%   6.255k ± 0%       ~ (p=0.069 n=6)
ControllerReconcileSyncSet-12                  83.28k ± 0%   82.46k ± 0%  -0.98% (p=0.002 n=6)
ControllerReconcileClusterInstall-12           87.45k ± 0%   87.16k ± 0%  -0.34% (p=0.009 n=6)
ControllerReconcileClusterPool/3Clusters-12    38.47k ± 0%   38.46k ± 0%       ~ (p=0.310 n=6)
ControllerReconcileClusterPool/5Clusters-12    64.05k ± 0%   64.08k ± 0%       ~ (p=0.180 n=6)
ControllerReconcileClusterPool/10Clusters-12   128.2k ± 0%   128.1k ± 0%       ~ (p=0.104 n=6)
ControllerReconcileIdentityProvider-12         63.14k ± 0%   63.21k ± 1%  +0.10% (p=0.041 n=6)
geomean                                        19.49k        19.48k       -0.08%

My opinion: 23% speedup from not allocating buffers (and not GCing them) is great, but the memory effects are negligible. Slight overhead on memory from the pool (+0.02%), and the actual allocation savings are a drop in the bucket compared to the rest of the allocations made (-0.08%).

/assign @2uasimojo

Summary by CodeRabbit

  • New Features

  • Added comprehensive benchmarking suite for Apply, Patch, and concurrent operations.

  • Introduced buffer pooling optimization to improve performance of resource application operations.

  • Added environment test infrastructure setup scripts and utilities for streamlined cluster testing.

  • Tests

  • Added benchmark tests for controller reconciliation workflows and resource operations.

  • Chores

  • Enhanced build system with environment test setup tooling and benchmark comparison utilities.

  • Updated dependencies to include Kubernetes testing infrastructure.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (3)
hack/setup-envtest.sh (1)

3-4: Redundant shell option.

set -e and set -o errexit are equivalent; one can be removed.

♻️ Suggested simplification
 set -e
-set -o errexit
 set -o nounset
 set -o pipefail
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup-envtest.sh` around lines 3 - 4, The script redundantly enables the
same shell option twice by using both "set -e" and "set -o errexit"; remove one
of them (either "set -e" or "set -o errexit") from the top of
hack/setup-envtest.sh so the script still exits on error but avoids duplicate
settings, leaving only a single invocation that enforces errexit.
vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/apiserver.go (1)

33-38: Minor typo in upstream comment.

Line 36-37 has a copy-paste error in the comment - both comments reference "saKeyFile" but line 37 defines saCertFile. This is a minor documentation issue in the vendored upstream controller-runtime code.

// saKeyFile is the name of the service account signing private key file.
saKeyFile = "sa-signer.key"
// saKeyFile is the name of the service account signing public key (cert) file.  // <-- should say "saCertFile"
saCertFile = "sa-signer.crt"

Since this is vendored code, consider reporting this upstream to controller-runtime if desired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/apiserver.go`
around lines 33 - 38, Fix the copy-paste comment: update the second comment that
currently mentions "saKeyFile" to correctly reference saCertFile so the comment
matches the constant it documents; locate the constant block containing
saKeyFile and saCertFile in apiserver.go and change the comment for the cert
constant to say "saCertFile is the name of the service account signing public
key (cert) file."
vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/etcd.go (1)

45-60: Minor typo in upstream comment.

Line 46 has a typo: "they will be evaluated"should be"they will be evaluated"`. This is a minor documentation issue in the vendored upstream controller-runtime code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/etcd.go`
around lines 45 - 60, The comment above the Args field in the Etcd struct
contains a typo ("the`y will be evaluated"); update the comment to read "they
will be evaluated" to correct the documentation for Args in the Etcd type (edit
the comment block that describes Args in the Etcd struct).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hack/benchmark-apply-comparison.sh`:
- Around line 37-45: The script currently uses git stash push/pop and STASHED
which is unreliable because the buffer-pool change is already committed; replace
the stash-based flow with creating a temporary git worktree for the baseline
commit (e4203bc0b or the commit hash/tag you want to benchmark), run the same go
test bench command (using BENCHTIME, COUNT and TEMP_DIR) inside that worktree
and redirect output to without.txt, then remove the temporary worktree; then run
the “with” benchmark in the main worktree producing with.txt—this avoids git
stash push/pop and the STASHED flag and prevents restoring unrelated stashes.

In `@hack/setup-envtest.sh`:
- Around line 8-10: The script currently warns when the setup-envtest binary is
missing but continues and later fails; modify the hack/setup-envtest.sh check so
that after detecting command -v setup-envtest returns false it exits with a
non-zero status (e.g., echo the message and run exit 1) to stop execution
immediately; reference the check that invokes setup-envtest and ensure any
callers won’t proceed when the binary is absent.

In `@pkg/resource/apply_benchmark_test.go`:
- Around line 290-314: The benchmark calls b.Fatalf from within goroutines
launched by b.RunParallel (in BenchmarkConcurrentApply and the inner closure
using helper.Apply), which is unsafe; change the error handling to record the
first error from concurrent goroutines using a concurrency-safe mechanism (e.g.,
an atomic.Value or an int32 flag plus storing the error string) and call
b.Fatalf (or b.Fatal) exactly once from the main benchmark goroutine after
RunParallel returns if any error was recorded; add the sync/atomic import and
modify the loop to set the atomic error/flag when helper.Apply returns an error
instead of calling b.Fatalf directly.

In `@vendor/sigs.k8s.io/controller-runtime/pkg/envtest/crd.go`:
- Around line 299-306: The code calls os.Stat(path) and only checks
os.IsNotExist(err), then dereferences info via info.IsDir(), which can panic for
non-ENOENT errors; update the logic around the os.Stat call (the variables info
and err) so you first check if err != nil: if os.IsNotExist(err) then handle
options.ErrorIfPathMissing as before (continue or return), otherwise return the
original err; only when err == nil should you dereference info and call
info.IsDir(). Ensure this change is applied to the block that uses os.Stat(path)
and the surrounding control flow that references options.ErrorIfPathMissing.

In `@vendor/sigs.k8s.io/controller-runtime/pkg/envtest/webhook.go`:
- Around line 258-267: The current loop treats apierrors.IsNotFound(err) as "not
ready" but still returns the NotFound error immediately; change the error
handling in the loop (the block that references names.Delete(name), allFound and
the err checks) so that if apierrors.IsNotFound(err) you mark allFound = false
and continue polling instead of returning err, only returning err for
non-NotFound, non-nil errors; keep names.Delete(name) only when err == nil so
successful finds still remove entries from names.
- Around line 111-118: The updateClientConfig function currently only rewrites
service-based configs when cc.Service.Path is set; change it to rewrite whenever
cc.Service != nil so manifests that omit Service.Path are converted to the local
envtest URL. In updateClientConfig, keep setting cc.CABundle, then if cc.Service
!= nil build a URL using hostPort and the optional path (e.g. if cc.Service.Path
!= nil append "/" + *cc.Service.Path, otherwise use hostPort with no extra path
or a single "/"), assign cc.URL to that value and set cc.Service = nil;
reference the updateClientConfig function and the cc.Service / cc.Service.Path
symbols to locate the change.

In `@vendor/sigs.k8s.io/controller-runtime/pkg/internal/flock/flock_unix.go`:
- Around line 33-47: In Acquire, close the file descriptor if unix.Flock fails
(both when errors.Is(err, unix.EWOULDBLOCK) and on any other flock error) to
avoid leaking fds; after calling unix.Flock(fd, ...), if err != nil then
close(fd) before returning the error or wrapping it as ErrAlreadyLocked. Also
remove the dead os.ErrExist branch after unix.Open since O_CREAT without O_EXCL
never returns os.ErrExist, simplifying the open error handling to just return
err on failure.

In `@vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/addr/manager.go`:
- Around line 124-127: The comment above Suggest (reserve-window) is out of sync
with the constant; update the comment to reflect the actual port reserve
duration used by portReserveTime (currently set to 2 minutes) or change the
constant if the intended window is 1 minute; locate the portReserveTime constant
and the reserve-window sentence near Suggest in manager.go and make the wording
and numeric value consistent (e.g., "within 2 minutes" if keeping
portReserveTime as-is).
- Around line 129-139: The loop defers listener.Close() causing listeners to
stay open across retries; remove the defer and explicitly close the listener
immediately after using it. In the retry loop that calls suggest(listenHost) and
then cache.add(port), replace the deferred Close with immediate calls to
listener.Close(): call listener.Close() before any return paths (both the
success branch where cache.add returns ok and the error branch where it returns
err) and also close it before continuing the loop when cache.add returns (false,
nil); adjust the code around suggest, listener, and cache.add accordingly so no
listener remains open across iterations.

In `@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/env/env.go`:
- Around line 423-457: PathMatches erroneously accepts a path as soon as
versionFromPathName parses a versioned basename, which can mutate
e.Version/e.Platform to values that don't satisfy the caller's requested
selectors; change PathMatches so that it does not immediately accept/commit
parsed values from versionFromPathName — instead, first parse the
version/platform from the basename (call versionFromPathName in a non-mutating
way or extract the parsed result without assigning to e.Version/e.Platform),
verify the parsed version/platform satisfy the current selector/constraints, and
only then set e.Version, e.Platform and return true; if the parsed values do not
match the selector, continue checking binaries (and fallback logic) as before.
Ensure the same non-mutating verification approach is applied to the analogous
code path later in the file that currently returns early after
versionFromPathName succeeds.
- Around line 269-276: The current flow fetches e.Platform.Hash via
e.Client.FetchSum but never uses it when downloading and storing the archive;
modify the download path (the code that calls e.HTTPClient.GetVersion and then
e.Store.Add) to verify the downloaded bytes against e.Platform.Hash when
e.VerifySum is true: compute the same hash algorithm used by FetchSum (e.g.,
SHA-256) while streaming the response (or after full download), compare the
computed digest to e.Platform.Hash, and return/ExitCause on mismatch before
calling e.Store.Add; ensure the verification logic references e.VerifySum,
e.Platform.Hash, e.Client.FetchSum, e.HTTPClient.GetVersion and e.Store.Add so
the integrity check is applied consistently.

In `@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/env/helpers.go`:
- Around line 31-34: Update the doc comment above the PrintEnv symbol to use the
correct command name "setup-envtest" instead of "fetch-envtest"; locate the
comment block referencing PrintEnv in helpers.go and replace the example
invocation ``source $(fetch-envtest switch -p env 1.20.x)`` with ``source
$(setup-envtest switch -p env 1.20.x)`` so the documentation matches the actual
tool name.

In `@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/README.md`:
- Around line 107-112: Fix two typos in the README text: change “As noted about”
to “As noted above” and “gziped” to “gzipped” in the paragraph that mentions
ENVTEST_INSTALLED_ONLY and ENVTEST_USE_ENV and the follow-up sentence about the
sideload command; update the strings in the README (look for occurrences of
ENVTEST_INSTALLED_ONLY, ENVTEST_USE_ENV, and “sideload”) so the corrected
wording reads naturally.

In
`@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/versions/platform.go`:
- Around line 35-41: BaseName currently formats only major.minor.patch which
drops prerelease parts and causes collisions; update Platform.BaseName to
include ver.Prerelease (and any build metadata if present) when constructing the
version string so prerelease-aware versions (e.g., 1.31.0-alpha.1) stay
distinct, and ensure ArchiveName (which uses BaseName) inherits that change;
locate and update the BaseName method in the Platform type and any other places
that duplicate the base-name construction (see functions named BaseName and
ArchiveName and the regex helpers ExtractWithPlatform / ConcreteVersionRE /
VersionPlatformRE for consistency) so stored/archive names preserve prerelease
identifiers.

---

Nitpick comments:
In `@hack/setup-envtest.sh`:
- Around line 3-4: The script redundantly enables the same shell option twice by
using both "set -e" and "set -o errexit"; remove one of them (either "set -e" or
"set -o errexit") from the top of hack/setup-envtest.sh so the script still
exits on error but avoids duplicate settings, leaving only a single invocation
that enforces errexit.

In
`@vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/apiserver.go`:
- Around line 33-38: Fix the copy-paste comment: update the second comment that
currently mentions "saKeyFile" to correctly reference saCertFile so the comment
matches the constant it documents; locate the constant block containing
saKeyFile and saCertFile in apiserver.go and change the comment for the cert
constant to say "saCertFile is the name of the service account signing public
key (cert) file."

In
`@vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/etcd.go`:
- Around line 45-60: The comment above the Args field in the Etcd struct
contains a typo ("the`y will be evaluated"); update the comment to read "they
will be evaluated" to correct the documentation for Args in the Etcd type (edit
the comment block that describes Args in the Etcd struct).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 80940d01-4185-4914-9617-ef7bf412455d

📥 Commits

Reviewing files that changed from the base of the PR and between 366515c and e4203bc.

📒 Files selected for processing (48)
  • Makefile
  • go.mod
  • hack/benchmark-apply-comparison.sh
  • hack/setup-envtest.sh
  • pkg/dependencymagnet/doc.go
  • pkg/resource/apply.go
  • pkg/resource/apply_benchmark_test.go
  • pkg/resource/bufferpool.go
  • pkg/resource/controller_sim_benchmark_test.go
  • pkg/resource/patch.go
  • pkg/resource/serializer.go
  • vendor/modules.txt
  • vendor/sigs.k8s.io/controller-runtime/pkg/envtest/crd.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/envtest/doc.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/envtest/helper.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/envtest/server.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/envtest/webhook.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/flock/doc.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/flock/errors.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/flock/flock_other.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/flock/flock_unix.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/addr/manager.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/certs/tinyca.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/apiserver.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/auth.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/etcd.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/kubectl.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/controlplane/plane.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/process/arguments.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/process/bin_path_finder.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/process/procattr_other.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/process/procattr_unix.go
  • vendor/sigs.k8s.io/controller-runtime/pkg/internal/testing/process/process.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/LICENSE
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/README.md
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/env/env.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/env/exit.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/env/helpers.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/main.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/remote/client.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/remote/http_client.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/remote/read_body.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/store/helpers.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/store/store.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/versions/parse.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/versions/platform.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/versions/version.go
  • vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/workflows/workflows.go

Comment on lines +37 to +45
git stash push -q pkg/resource/bufferpool.go pkg/resource/apply.go pkg/resource/patch.go
STASHED=1

echo "Running benchmarks WITHOUT buffer pool..."
go test -bench='^Benchmark(Apply|Patch|Controller)' -benchmem -benchtime=${BENCHTIME} -count=${COUNT} ./pkg/resource/ > "${TEMP_DIR}/without.txt" 2>&1

echo "Restoring changes..."
git stash pop -q
STASHED=0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full script context around lines 37-45
cat -n hack/benchmark-apply-comparison.sh | head -50

Repository: openshift/hive

Length of output: 2056


🏁 Script executed:

# Check if the files mentioned (bufferpool.go, apply.go, patch.go) exist
fd -t f "bufferpool.go|apply.go|patch.go" pkg/resource/

Repository: openshift/hive

Length of output: 128


🏁 Script executed:

# Let's also check the git log to see what commits might be involved
# Get a summary of recent commits touching these files
git log --oneline -20 -- pkg/resource/bufferpool.go pkg/resource/apply.go pkg/resource/patch.go 2>/dev/null | head -10

Repository: openshift/hive

Length of output: 105


Benchmark A/B switch is unreliable: git stash push won't stash committed changes.

The buffer pool optimization (commit e4203bc) is already committed, so git stash push on lines 37–38 stashes nothing. Both benchmark runs measure identical code. Additionally, line 44's git stash pop -q may restore an unrelated pre-existing stash, corrupting developer state.

Use a temporary worktree to safely benchmark the baseline commit:

🛠️ Safer approach: benchmark baseline via a temporary worktree
 TEMP_DIR=$(mktemp -d)
-STASHED=0
+WITHOUT_TREE=""
@@
-# Ensure git stash is restored on exit
+# Ensure temp worktree is cleaned up on exit
 cleanup() {
-    if [ "$STASHED" -eq 1 ]; then
-        echo ""
-        echo "Restoring stashed changes..."
-        git stash pop -q || echo "Warning: failed to restore stash"
-    fi
+    if [ -n "$WITHOUT_TREE" ] && [ -d "$WITHOUT_TREE" ]; then
+        git worktree remove --force "$WITHOUT_TREE" >/dev/null 2>&1 || true
+    fi
 }
 trap cleanup EXIT
@@
-echo "Stashing buffer pool changes..."
-git stash push -q pkg/resource/bufferpool.go pkg/resource/apply.go pkg/resource/patch.go
-STASHED=1
+echo "Preparing baseline worktree (without buffer pool commit)..."
+WITHOUT_TREE=$(mktemp -d)
+git worktree add --detach "$WITHOUT_TREE" HEAD~1 >/dev/null
@@
 echo "Running benchmarks WITHOUT buffer pool..."
-go test -bench='^Benchmark(Apply|Patch|Controller)' -benchmem -benchtime=${BENCHTIME} -count=${COUNT} ./pkg/resource/ > "${TEMP_DIR}/without.txt" 2>&1
-
-echo "Restoring changes..."
-git stash pop -q
-STASHED=0
+(cd "$WITHOUT_TREE" && go test -bench='^Benchmark(Apply|Patch|Controller)' -benchmem -benchtime=${BENCHTIME} -count=${COUNT} ./pkg/resource/) > "${TEMP_DIR}/without.txt" 2>&1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/benchmark-apply-comparison.sh` around lines 37 - 45, The script
currently uses git stash push/pop and STASHED which is unreliable because the
buffer-pool change is already committed; replace the stash-based flow with
creating a temporary git worktree for the baseline commit (e4203bc0b or the
commit hash/tag you want to benchmark), run the same go test bench command
(using BENCHTIME, COUNT and TEMP_DIR) inside that worktree and redirect output
to without.txt, then remove the temporary worktree; then run the “with”
benchmark in the main worktree producing with.txt—this avoids git stash push/pop
and the STASHED flag and prevents restoring unrelated stashes.

Comment on lines +8 to +10
if ! command -v setup-envtest &> /dev/null; then
echo "run 'make install-tools' to install setup-envtest"
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Script continues even when setup-envtest is missing.

The check detects when setup-envtest is not installed but doesn't exit, causing the script to fail later with a confusing error when it tries to execute setup-envtest use.

🐛 Proposed fix
 if ! command -v setup-envtest &> /dev/null; then
     echo "run 'make install-tools' to install setup-envtest"
+    exit 1
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ! command -v setup-envtest &> /dev/null; then
echo "run 'make install-tools' to install setup-envtest"
fi
if ! command -v setup-envtest &> /dev/null; then
echo "run 'make install-tools' to install setup-envtest"
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup-envtest.sh` around lines 8 - 10, The script currently warns when
the setup-envtest binary is missing but continues and later fails; modify the
hack/setup-envtest.sh check so that after detecting command -v setup-envtest
returns false it exits with a non-zero status (e.g., echo the message and run
exit 1) to stop execution immediately; reference the check that invokes
setup-envtest and ensure any callers won’t proceed when the binary is absent.

Comment on lines +290 to +314
// BenchmarkConcurrentApply benchmarks concurrent apply operations
func BenchmarkConcurrentApply(b *testing.B) {
_, cfg := setupTestEnv(b)

logger := log.New()
logger.SetLevel(log.ErrorLevel)

helper, err := NewHelper(logger, FromRESTConfig(cfg), WithControllerName("benchmark"))
if err != nil {
b.Fatalf("failed to create helper: %v", err)
}

b.RunParallel(func(pb *testing.PB) {
resources := [][]byte{configMapYAML, secretYAML, serviceAccountYAML}
i := 0
for pb.Next() {
resource := resources[i%len(resources)]
_, err := helper.Apply(resource)
if err != nil {
b.Fatalf("apply failed: %v", err)
}
i++
}
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Calling b.Fatalf inside b.RunParallel is unsafe.

b.Fatalf should not be called from goroutines spawned by b.RunParallel. It calls runtime.Goexit() which only exits the current goroutine, potentially causing race conditions or incomplete test termination. Use error collection with synchronization instead.

🔧 Proposed fix using atomic error tracking
 // BenchmarkConcurrentApply benchmarks concurrent apply operations
 func BenchmarkConcurrentApply(b *testing.B) {
 	_, cfg := setupTestEnv(b)
 
 	logger := log.New()
 	logger.SetLevel(log.ErrorLevel)
 
 	helper, err := NewHelper(logger, FromRESTConfig(cfg), WithControllerName("benchmark"))
 	if err != nil {
 		b.Fatalf("failed to create helper: %v", err)
 	}
 
+	var applyErr atomic.Pointer[error]
 	b.RunParallel(func(pb *testing.PB) {
 		resources := [][]byte{configMapYAML, secretYAML, serviceAccountYAML}
 		i := 0
 		for pb.Next() {
 			resource := resources[i%len(resources)]
 			_, err := helper.Apply(resource)
 			if err != nil {
-				b.Fatalf("apply failed: %v", err)
+				applyErr.CompareAndSwap(nil, &err)
+				return
 			}
 			i++
 		}
 	})
+	if e := applyErr.Load(); e != nil {
+		b.Fatalf("apply failed: %v", *e)
+	}
 }

Add the import for sync/atomic at the top of the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/resource/apply_benchmark_test.go` around lines 290 - 314, The benchmark
calls b.Fatalf from within goroutines launched by b.RunParallel (in
BenchmarkConcurrentApply and the inner closure using helper.Apply), which is
unsafe; change the error handling to record the first error from concurrent
goroutines using a concurrency-safe mechanism (e.g., an atomic.Value or an int32
flag plus storing the error string) and call b.Fatalf (or b.Fatal) exactly once
from the main benchmark goroutine after RunParallel returns if any error was
recorded; add the sync/atomic import and modify the loop to set the atomic
error/flag when helper.Apply returns an error instead of calling b.Fatalf
directly.

Comment on lines +299 to +306
if info, err = os.Stat(path); os.IsNotExist(err) {
if options.ErrorIfPathMissing {
return nil, err
}
continue
}

if !info.IsDir() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle non-ENOENT os.Stat errors before dereferencing info.

Line 299 only special-cases os.IsNotExist(err). A permission or transient I/O error leaves info == nil, and Line 306 then panics on info.IsDir() instead of returning the original error.

Proposed fix
-		// Return the error if ErrorIfPathMissing exists
-		if info, err = os.Stat(path); os.IsNotExist(err) {
-			if options.ErrorIfPathMissing {
-				return nil, err
-			}
-			continue
-		}
+		if info, err = os.Stat(path); err != nil {
+			if os.IsNotExist(err) && !options.ErrorIfPathMissing {
+				continue
+			}
+			return nil, err
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if info, err = os.Stat(path); os.IsNotExist(err) {
if options.ErrorIfPathMissing {
return nil, err
}
continue
}
if !info.IsDir() {
if info, err = os.Stat(path); err != nil {
if os.IsNotExist(err) && !options.ErrorIfPathMissing {
continue
}
return nil, err
}
if !info.IsDir() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/sigs.k8s.io/controller-runtime/pkg/envtest/crd.go` around lines 299 -
306, The code calls os.Stat(path) and only checks os.IsNotExist(err), then
dereferences info via info.IsDir(), which can panic for non-ENOENT errors;
update the logic around the os.Stat call (the variables info and err) so you
first check if err != nil: if os.IsNotExist(err) then handle
options.ErrorIfPathMissing as before (continue or return), otherwise return the
original err; only when err == nil should you dereference info and call
info.IsDir(). Ensure this change is applied to the block that uses os.Stat(path)
and the surrounding control flow that references options.ErrorIfPathMissing.

Comment on lines +111 to +118
func updateClientConfig(cc *admissionv1.WebhookClientConfig, hostPort string, caData []byte) {
cc.CABundle = caData
if cc.Service != nil && cc.Service.Path != nil {
url := fmt.Sprintf("https://%s/%s", hostPort, *cc.Service.Path)
cc.URL = &url
cc.Service = nil
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rewrite service-based webhook configs even when service.path is omitted.

admissionregistration.k8s.io/v1 makes Service.Path optional. With the current cc.Service != nil && cc.Service.Path != nil guard, manifests that omit path keep the in-cluster Service reference and never get switched to the local envtest URL.

Proposed fix
 func updateClientConfig(cc *admissionv1.WebhookClientConfig, hostPort string, caData []byte) {
 	cc.CABundle = caData
-	if cc.Service != nil && cc.Service.Path != nil {
-		url := fmt.Sprintf("https://%s/%s", hostPort, *cc.Service.Path)
+	if cc.Service != nil {
+		path := ""
+		if cc.Service.Path != nil {
+			path = *cc.Service.Path
+		}
+		url := fmt.Sprintf("https://%s%s", hostPort, path)
 		cc.URL = &url
 		cc.Service = nil
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/sigs.k8s.io/controller-runtime/pkg/envtest/webhook.go` around lines
111 - 118, The updateClientConfig function currently only rewrites service-based
configs when cc.Service.Path is set; change it to rewrite whenever cc.Service !=
nil so manifests that omit Service.Path are converted to the local envtest URL.
In updateClientConfig, keep setting cc.CABundle, then if cc.Service != nil build
a URL using hostPort and the optional path (e.g. if cc.Service.Path != nil
append "/" + *cc.Service.Path, otherwise use hostPort with no extra path or a
single "/"), assign cc.URL to that value and set cc.Service = nil; reference the
updateClientConfig function and the cc.Service / cc.Service.Path symbols to
locate the change.

Comment on lines +269 to +276
if e.VerifySum && e.Platform.Hash == nil {
if err := e.Client.FetchSum(ctx, *e.Version.AsConcrete(), &e.Platform); err != nil {
ExitCause(2, err, "unable to fetch hash for requested version")
}
}
if !e.VerifySum {
e.Platform.Hash = nil // skip verification
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

VerifySum never validates the downloaded archive.

Lines 269-276 fetch e.Platform.Hash, but Lines 298-310 stream the archive straight into Store.Add. In the code shown here, neither HTTPClient.GetVersion nor Store.Add consumes that hash, so a tampered download is still accepted even when VerifySum is true.

Also applies to: 298-310

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/env/env.go` around
lines 269 - 276, The current flow fetches e.Platform.Hash via e.Client.FetchSum
but never uses it when downloading and storing the archive; modify the download
path (the code that calls e.HTTPClient.GetVersion and then e.Store.Add) to
verify the downloaded bytes against e.Platform.Hash when e.VerifySum is true:
compute the same hash algorithm used by FetchSum (e.g., SHA-256) while streaming
the response (or after full download), compare the computed digest to
e.Platform.Hash, and return/ExitCause on mismatch before calling e.Store.Add;
ensure the verification logic references e.VerifySum, e.Platform.Hash,
e.Client.FetchSum, e.HTTPClient.GetVersion and e.Store.Add so the integrity
check is applied consistently.

Comment on lines +423 to +457
func (e *Env) PathMatches(value string) bool {
e.Log.V(1).Info("checking if (env var) path represents our desired version", "path", value)
if value == "" {
// if we're unset,
return false
}

if e.versionFromPathName(value) {
e.Log.V(1).Info("path appears to be in our store, using that info", "path", value)
return true
}

e.Log.V(1).Info("path is not in our store, checking for binaries", "path", value)
for _, expected := range expectedExecutables {
_, err := e.FS.Stat(filepath.Join(value, expected))
if err != nil {
if errors.Is(err, fs.ErrNotExist) {
// one of our required binaries is missing, return false
e.Log.V(1).Info("missing required binary in (env var) path", "binary", expected, "path", value)
return false
}
ExitCause(2, err, "unable to check for existence of binary %s from existing (env var) path %s", value, expected)
}
}

// success, all binaries present
e.Log.V(1).Info("all required binaries present in (env var) path, using that", "path", value)

// don't bother checking the version, the user explicitly asked us to use this
// we don't know the version, so set it to wildcard
e.Version = versions.AnyVersion
e.Platform.OS = "*"
e.Platform.Arch = "*"
e.manualPath = value
return true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject store paths that do not satisfy the current version/platform selector.

PathMatches returns true as soon as versionFromPathName can parse a versioned basename. Because versionFromPathName also mutates e.Version and e.Platform, a path like 1.29.0-darwin-amd64 can satisfy a request for a different version or platform.

Also applies to: 464-477

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/env/env.go` around
lines 423 - 457, PathMatches erroneously accepts a path as soon as
versionFromPathName parses a versioned basename, which can mutate
e.Version/e.Platform to values that don't satisfy the caller's requested
selectors; change PathMatches so that it does not immediately accept/commit
parsed values from versionFromPathName — instead, first parse the
version/platform from the basename (call versionFromPathName in a non-mutating
way or extract the parsed result without assigning to e.Version/e.Platform),
verify the parsed version/platform satisfy the current selector/constraints, and
only then set e.Version, e.Platform and return true; if the parsed values do not
match the selector, continue checking binaries (and fallback logic) as before.
Ensure the same non-mutating verification approach is applied to the analogous
code path later in the file that currently returns early after
versionFromPathName succeeds.

Comment on lines +31 to +34
// PrintEnv prints the path with the corresponding env variable, so that
// you can source the output like
// `source $(fetch-envtest switch -p env 1.20.x)`.
PrintEnv
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment example appears to use the wrong command name.

Line 33 shows fetch-envtest, but the tool/documentation uses setup-envtest. This can confuse future edits.

✏️ Suggested comment fix
-	// `source $(fetch-envtest switch -p env 1.20.x)`.
+	// `source $(setup-envtest use -p env 1.20.x)`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/env/helpers.go`
around lines 31 - 34, Update the doc comment above the PrintEnv symbol to use
the correct command name "setup-envtest" instead of "fetch-envtest"; locate the
comment block referencing PrintEnv in helpers.go and replace the example
invocation ``source $(fetch-envtest switch -p env 1.20.x)`` with ``source
$(setup-envtest switch -p env 1.20.x)`` so the documentation matches the actual
tool name.

Comment on lines +107 to +112
As noted about, you can use `ENVTEST_INSTALLED_ONLY=true` to switch `-i`
on by default, and you can use `ENVTEST_USE_ENV=true` to switch
`--use-env` on by default.

- If you want to use this tool, but download your gziped tarballs
separately, you can use the `sideload` command. You'll need to use the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix small wording typos in offline usage docs.

Line 107 reads “As noted about” (likely “As noted above”), and Line 111 uses “gziped” (should be “gzipped”).

✏️ Suggested doc fix
-  As noted about, you can use `ENVTEST_INSTALLED_ONLY=true` to switch `-i`
+  As noted above, you can use `ENVTEST_INSTALLED_ONLY=true` to switch `-i`
@@
-- If you want to use this tool, but download your gziped tarballs
+- If you want to use this tool, but download your gzipped tarballs
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
As noted about, you can use `ENVTEST_INSTALLED_ONLY=true` to switch `-i`
on by default, and you can use `ENVTEST_USE_ENV=true` to switch
`--use-env` on by default.
- If you want to use this tool, but download your gziped tarballs
separately, you can use the `sideload` command. You'll need to use the
As noted above, you can use `ENVTEST_INSTALLED_ONLY=true` to switch `-i`
on by default, and you can use `ENVTEST_USE_ENV=true` to switch
`--use-env` on by default.
- If you want to use this tool, but download your gzipped tarballs
separately, you can use the `sideload` command. You'll need to use the
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/README.md` around
lines 107 - 112, Fix two typos in the README text: change “As noted about” to
“As noted above” and “gziped” to “gzipped” in the paragraph that mentions
ENVTEST_INSTALLED_ONLY and ENVTEST_USE_ENV and the follow-up sentence about the
sideload command; update the strings in the README (look for occurrences of
ENVTEST_INSTALLED_ONLY, ENVTEST_USE_ENV, and “sideload”) so the corrected
wording reads naturally.

Comment on lines +35 to +41
func (p Platform) BaseName(ver Concrete) string {
return fmt.Sprintf("%d.%d.%d-%s-%s", ver.Major, ver.Minor, ver.Patch, p.OS, p.Arch)
}

// ArchiveName returns the full archive name for this version and platform.
func (p Platform) ArchiveName(ver Concrete) string {
return "envtest-v" + p.BaseName(ver) + ".tar.gz"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve prerelease identifiers in BaseName.

ExtractWithPlatform and ConcreteVersionRE support prerelease components, but BaseName collapses everything to major.minor.patch. That makes versions like 1.31.0-alpha.1 and 1.31.0 share the same store/archive name and breaks round-tripping through VersionPlatformRE.

Proposed fix
 func (p Platform) BaseName(ver Concrete) string {
-	return fmt.Sprintf("%d.%d.%d-%s-%s", ver.Major, ver.Minor, ver.Patch, p.OS, p.Arch)
+	return fmt.Sprintf("%s-%s-%s", ver, p.OS, p.Arch)
 }

Also applies to: 94-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@vendor/sigs.k8s.io/controller-runtime/tools/setup-envtest/versions/platform.go`
around lines 35 - 41, BaseName currently formats only major.minor.patch which
drops prerelease parts and causes collisions; update Platform.BaseName to
include ver.Prerelease (and any build metadata if present) when constructing the
version string so prerelease-aware versions (e.g., 1.31.0-alpha.1) stay
distinct, and ensure ArchiveName (which uses BaseName) inherits that change;
locate and update the BaseName method in the Platform type and any other places
that duplicate the base-name construction (see functions named BaseName and
ArchiveName and the regex helpers ExtractWithPlatform / ConcreteVersionRE /
VersionPlatformRE for consistency) so stored/archive names preserve prerelease
identifiers.

@dlom
Copy link
Contributor Author

dlom commented Mar 17, 2026

need a way to exclude vendor changes from coderabbit...

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@dlom: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 0% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.27%. Comparing base (4f955dc) to head (e4203bc).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
pkg/resource/bufferpool.go 0.00% 20 Missing ⚠️
pkg/resource/apply.go 0.00% 4 Missing ⚠️
pkg/resource/patch.go 0.00% 2 Missing ⚠️
pkg/resource/serializer.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2873      +/-   ##
==========================================
- Coverage   50.30%   50.27%   -0.03%     
==========================================
  Files         280      281       +1     
  Lines       34310    34327      +17     
==========================================
  Hits        17259    17259              
- Misses      15690    15707      +17     
  Partials     1361     1361              
Files with missing lines Coverage Δ
pkg/resource/serializer.go 9.37% <0.00%> (ø)
pkg/resource/patch.go 0.00% <0.00%> (ø)
pkg/resource/apply.go 0.00% <0.00%> (ø)
pkg/resource/bufferpool.go 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@2uasimojo
Copy link
Member

need a way to exclude vendor changes from coderabbit...

This should be doing it. We oughtta ask why it isn't. (I'm having the same experience on #2851.)

@dlom
Copy link
Contributor Author

dlom commented Mar 18, 2026

closing this for now, we need a better benchmark suite first in a separate PR

@dlom dlom closed this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants