Skip to content

filter_kubernetes: don't recycle connections in fetch_pod_service_map#11600

Open
edsiper wants to merge 1 commit intomasterfrom
kube_pod_leak
Open

filter_kubernetes: don't recycle connections in fetch_pod_service_map#11600
edsiper wants to merge 1 commit intomasterfrom
kube_pod_leak

Conversation

@edsiper
Copy link
Member

@edsiper edsiper commented Mar 21, 2026

  • mark connections as non-recyclable before releasing to free TLS resources
  • connections are destroyed instead of pooled since polling runs every 60s anyway

fixes #11523


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced connection management in the Kubernetes AWS integration by implementing proper upstream connection recycling during failure scenarios and cleanup operations, improving overall stability and resource efficiency.

- mark connections as non-recyclable before releasing to free TLS resources
- connections are destroyed instead of pooled since polling runs every 60s anyway

fixes #11523

Signed-off-by: Cameron Sparr <sparrc@users.noreply.github.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d433a78-7634-456f-b9eb-858e23e90580

📥 Commits

Reviewing files that changed from the base of the PR and between fc8dbd4 and b8a6bee.

📒 Files selected for processing (1)
  • plugins/filter_kubernetes/kubernetes_aws.c

📝 Walkthrough

Walkthrough

Modified the Kubernetes AWS filter's fetch_pod_service_map function to explicitly recycle upstream connections with FLB_FALSE flag at failure points and cleanup stages, addressing a memory leak issue when Use_Pod_Association is enabled.

Changes

Cohort / File(s) Summary
Connection Recycling Fix
plugins/filter_kubernetes/kubernetes_aws.c
Added fluent-bit/flb_upstream.h include and three flb_upstream_conn_recycle(u_conn, FLB_FALSE) calls within fetch_pod_service_map: after HTTP client creation failure, after non-200/failed HTTP responses (before releasing connection), and during cleanup after HTTP client destruction. Ensures upstream connections are marked non-recyclable before release.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • cosmo0920

Poem

🐰 Connections that leak make memory weak,
So recycle them well, every turn and each seek!
No more growing heaps in the kubernetes way,
The upstream flows clean now, hooray, hooray! 🌊

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR adds connection recycling prevention to address issue #11523's memory leak, but limited code context prevents verification that this alone resolves the underlying TLS/OpenSSL resource leaks identified in the heap profiles. Confirm this change sufficiently addresses the TLS resource leaks, or clarify if additional fixes (SSL_CTX/session cleanup) are planned in follow-up PRs.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: explicitly marking upstream connections as non-recyclable before release in fetch_pod_service_map.
Out of Scope Changes check ✅ Passed All changes are focused on preventing connection recycling in fetch_pod_service_map to fix the identified memory leak; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kube_pod_leak

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8a6bee052

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/* Cleanup */
/* Cleanup - mark connection as non-recyclable to prevent memory leak */
flb_http_client_destroy(c);
flb_upstream_conn_recycle(u_conn, FLB_FALSE);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep successful pod-association fetches recyclable

When the pod-association endpoint rotates its mTLS certs, this line forces the very next refresh to open a new TLS session with the old in-memory ctx->aws_pod_association_tls object. That TLS context is only rebuilt in flb_kube_pod_association_init() (plugins/filter_kubernetes/kube_meta.c:2020-2045), and the surrounding comment in fetch_pod_service_map() (plugins/filter_kubernetes/kubernetes_aws.c:214-218) explicitly relies on reusing the existing connection until a failure triggers that rebuild. With recycle forced off after every 200 response, the first post-rotation refresh will fail and the pod/service map stays stale until the next refresh interval recreates the upstream.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

filter_kubernetes: memory leak when Use_Pod_Association is enabled

2 participants