Skip to content

Avoid RBAC errors when Operator can't list or watch ConfigMaps#2889

Open
levan-m wants to merge 3 commits intomainfrom
levan-m/cm-watch-without-rbac
Open

Avoid RBAC errors when Operator can't list or watch ConfigMaps#2889
levan-m wants to merge 3 commits intomainfrom
levan-m/cm-watch-without-rbac

Conversation

@levan-m
Copy link
Copy Markdown
Collaborator

@levan-m levan-m commented Apr 10, 2026

What does this PR do?

reported in #2886

Fix similar to #2793 for configmaps

  • Don't start CM watch in helm metadata forwarder if Operator doesn't have list/watch RBAC.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

  1. install Operator
❯ helm install operator datadog/datadog-operator \
--set datadogMonitor.enabled=true \
--set datadogAgent.enabled=false \
--set datadogAgentInternal.enabled=false
  1. Update rbac to drop secret watch, list permissions, restart operator pod.
kubectl get clusterrole operator-datadog-operator -o json \
    | jq '
        .rules |= map(
          if (.resources // [] | index("configmaps")) != null
          then .resources |= map(select(. != "configmaps"))
          else .
          end
        )
        | .rules += [{"apiGroups":[""],"resources":["configmaps"],"verbs":["create","delete","get","patch","update"]}]
      ' \
    | kubectl apply -f -

kubectl rollout restart deployment/operator-datadog-operator
  1. Observer errors
    {"level":"ERROR","ts":"2026-04-10T14:22:21.312Z","logger":"controller-runtime.cache.UnhandledError","msg":"Failed to watch","reflector":"pkg/mod/k8s.io/client-go@v0.35.1/tools/cache/reflector.go:289","type":"*v1.ConfigMap","error":"failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:default:operator-datadog-operator" cannot list resource "configmaps" in API group "" in the namespace "default"","stacktrace":"k8s.io/apimachinery/pkg/util/runtime.logError\n\t/go/pkg/mod/k8s.io/apimachinery@v0.35.1/pkg/util/runtime/runtime.go:221\n

  2. Update to fixed image; after restart Operator should log
    {"level":"INFO","ts":"2026-04-10T14:18:38.100Z","logger":"metadata.helm","msg":"No permission to list/watch ConfigMaps, Helm metadata collection from ConfigMaps will be disabled"}

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

@levan-m levan-m added this to the v1.26.0 milestone Apr 10, 2026
@levan-m levan-m added the bug Something isn't working label Apr 10, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f77d142c3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 172 to 175
ResourceAttributes: &authorizationv1.ResourceAttributes{
Verb: verb,
Resource: "secrets",
Resource: resource,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include namespace in RBAC self-check

The new canListWatch probe creates SelfSubjectAccessReview requests without ResourceAttributes.Namespace, which turns the check into an all-namespaces authorization query for a namespaced resource. In namespace-scoped deployments (the manager cache is explicitly scoped via WATCH_NAMESPACE/DD_AGENT_WATCH_NAMESPACE), a service account can legitimately list/watch ConfigMaps in its watched namespace but still fail this cluster-wide SSAR, causing Start() to set configMapAccessEnabled=false and disable ConfigMap Helm metadata collection unnecessarily.

Useful? React with 👍 / 👎.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 0% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.04%. Comparing base (ef88adb) to head (9ddbf1d).

Files with missing lines Patch % Lines
pkg/controller/utils/metadata/helm_metadata.go 0.00% 47 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2889      +/-   ##
==========================================
- Coverage   40.06%   40.04%   -0.02%     
==========================================
  Files         319      319              
  Lines       28039    28050      +11     
==========================================
  Hits        11233    11233              
- Misses      15983    15994      +11     
  Partials      823      823              
Flag Coverage Δ
unittests 40.04% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/controller/utils/metadata/helm_metadata.go 25.89% <0.00%> (-0.88%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef88adb...9ddbf1d. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@tbavelier tbavelier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with a small nit: there's a rare/unlikely possibility that RBAC passes but informer registration fails but the flag doesn't reflect that (same for the previous PR) for processKey (it's true despite informer registration failing), so while we still log we disable the collection, we actually do not change the flag. See suggestion below, it applies to both secrets and configmaps: #2894
Note in practice I'm not sure how this could happen, so this is mostly defensive / meant to showcase how a unit test could fail

@levan-m levan-m requested a review from a team as a code owner April 13, 2026 14:20
@datadog-prod-us1-3
Copy link
Copy Markdown

❌ Code Coverage

Fix all issues with BitsAI

🛑 Gate Violations

🎯 1 Code Coverage issue detected

A Patch coverage percentage gate may be blocking this PR.

Patch coverage: 0.00% (threshold: 80.00%)

ℹ️ Info

🎯 Code Coverage (details)
Patch Coverage: 0.00%
Overall Coverage: 40.13% (-0.02%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 9ddbf1d | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working team/container-platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants