Skip to content

Do not cache kubeconfig for exec based auth in AsyncKubernetesHook#65212

Open
amoghrajesh wants to merge 2 commits intoapache:mainfrom
astronomer:pr-61738-review
Open

Do not cache kubeconfig for exec based auth in AsyncKubernetesHook#65212
amoghrajesh wants to merge 2 commits intoapache:mainfrom
astronomer:pr-61738-review

Conversation

@amoghrajesh
Copy link
Copy Markdown
Contributor


Was generative AI tooling used to co-author this PR?
  • Yes - Cursor with Sonnet 4.6

Reattempt at: #61738

closes: #61738

Deferrable tasks on EKS/GKE can run longer than the 15-minute token lifetime issued by exec auth plugins (aws eks get-token, gke-gcloud-auth-plugin). Because _load_config() caches the kubeconfig on the first call, subsequent
calls are no operations and the expired token gets reused, causing auth failures.

Attempting to redo the PR by addressing review comments from myself

  • Add _uses_exec_auth() to detect exec-based auth in the active context user only (not any user in the file), passing
  • Skip caching when exec auth is detected; all other auth paths continue to cache as before

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Apr 14, 2026
@amoghrajesh amoghrajesh self-assigned this Apr 14, 2026
@amoghrajesh
Copy link
Copy Markdown
Contributor Author

Adding back same set of reviewers: @shahar1 @potiuk

Copy link
Copy Markdown
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :)

Copy link
Copy Markdown
Collaborator

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

Comment on lines +883 to +886
if not self._uses_exec_auth(self.config_dict, context=cluster_context):
self._config_loaded = True

return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: When exec auth is detected, load_kube_config_from_dict is called on every _load_config() invocation since _config_loaded stays False. Should we consider separating config initialised from config cached to avoid redundant loads?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually having a new config for this might turn out cleaner. Handling that using a new private class var: _is_exec_auth, with an intent to run _uses_exec_auth once per instance.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates AsyncKubernetesHook kubeconfig loading to avoid caching configs that use exec-based authentication (e.g. EKS/GKE), preventing expired short-lived tokens from being reused across long-running/deferrable tasks.

Changes:

  • Add _uses_exec_auth() to detect exec-based auth in the active kubeconfig context user (with a conservative fallback).
  • Skip setting _config_loaded (i.e., disable caching) when exec-based auth is detected; keep existing caching behavior for non-exec auth.
  • Add unit tests covering _uses_exec_auth() and caching behavior for dict/string kubeconfigs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py Adds exec-auth detection and conditionally disables config caching to avoid reusing expired exec tokens.
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/hooks/test_kubernetes.py Adds tests for exec-auth detection and verifies caching vs non-caching behavior for kubeconfig inputs.

@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented Apr 14, 2026

Isn't that the same issue as #63610 which we concluded as won't fix since the solution is to bump botocore version #60943 (comment) ?

@amoghrajesh
Copy link
Copy Markdown
Contributor Author

@eladkal I wasn't aware of that one so I investigated a bit.

They are actually distinct issues.

#63610 was trying to fix a race condition, ie: concurrent tasks fighting over the aws cli cache dir, causing FileExistsError. A new version of botocore helped solve it because botocore ≥ 1.40.2 seemed to handle it internally.

This PR is different and is a token expiry issue specific to AsyncKubernetesHook. On the first _load_config() call, the exec plugin runs and obtains a token which is short lived. Because _config_loaded = True is set, all future calls return early and do not renew the token resulting to failed auth.

@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented Apr 15, 2026

This PR is different and is a token expiry issue specific to AsyncKubernetesHook. On the first _load_config() call, the exec plugin runs and obtains a token which is short lived. Because _config_loaded = True is set, all future calls return early and do not renew the token resulting to failed auth.

Since this fix is for EKS and GKE I guess we should bump the k8s provider version in these providers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants