Do not cache kubeconfig for exec based auth in AsyncKubernetesHook by amoghrajesh · Pull Request #65212 · apache/airflow

amoghrajesh · 2026-04-14T12:17:19Z

Was generative AI tooling used to co-author this PR?

Yes - Cursor with Sonnet 4.6

Reattempt at: #61738

Deferrable tasks on EKS/GKE can run longer than the 15-minute token lifetime issued by exec auth plugins (aws eks get-token, gke-gcloud-auth-plugin). Because _load_config() caches the kubeconfig on the first call, subsequent
calls are no operations and the expired token gets reused, causing auth failures.

Attempting to redo the PR by addressing review comments from myself

Add _uses_exec_auth() to detect exec-based auth in the active context user only (not any user in the file), passing
Skip caching when exec auth is detected; all other auth paths continue to cache as before

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

amoghrajesh · 2026-04-14T12:20:00Z

Adding back same set of reviewers: @shahar1 @potiuk

shahar1

LGTM! :)

sunank200

Overall LGTM.

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py

sunank200 · 2026-04-14T14:30:59Z

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py

+            if not self._uses_exec_auth(self.config_dict, context=cluster_context):
+                self._config_loaded = True
+
+            return


nit: When exec auth is detected, load_kube_config_from_dict is called on every _load_config() invocation since _config_loaded stays False. Should we consider separating config initialised from config cached to avoid redundant loads?

Yeah actually having a new config for this might turn out cleaner. Handling that using a new private class var: _is_exec_auth, with an intent to run _uses_exec_auth once per instance.

Copilot

Pull request overview

This PR updates AsyncKubernetesHook kubeconfig loading to avoid caching configs that use exec-based authentication (e.g. EKS/GKE), preventing expired short-lived tokens from being reused across long-running/deferrable tasks.

Changes:

Add _uses_exec_auth() to detect exec-based auth in the active kubeconfig context user (with a conservative fallback).
Skip setting _config_loaded (i.e., disable caching) when exec-based auth is detected; keep existing caching behavior for non-exec auth.
Add unit tests covering _uses_exec_auth() and caching behavior for dict/string kubeconfigs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py`	Adds exec-auth detection and conditionally disables config caching to avoid reusing expired exec tokens.
`providers/cncf/kubernetes/tests/unit/cncf/kubernetes/hooks/test_kubernetes.py`	Adds tests for exec-auth detection and verifies caching vs non-caching behavior for kubeconfig inputs.

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py

eladkal · 2026-04-14T18:45:31Z

Isn't that the same issue as #63610 which we concluded as won't fix since the solution is to bump botocore version #60943 (comment) ?

amoghrajesh · 2026-04-15T07:26:37Z

@eladkal I wasn't aware of that one so I investigated a bit.

They are actually distinct issues.

#63610 was trying to fix a race condition, ie: concurrent tasks fighting over the aws cli cache dir, causing FileExistsError. A new version of botocore helped solve it because botocore ≥ 1.40.2 seemed to handle it internally.

This PR is different and is a token expiry issue specific to AsyncKubernetesHook. On the first _load_config() call, the exec plugin runs and obtains a token which is short lived. Because _config_loaded = True is set, all future calls return early and do not renew the token resulting to failed auth.

eladkal · 2026-04-15T08:41:12Z

This PR is different and is a token expiry issue specific to AsyncKubernetesHook. On the first _load_config() call, the exec plugin runs and obtains a token which is short lived. Because _config_loaded = True is set, all future calls return early and do not renew the token resulting to failed auth.

Since this fix is for EKS and GKE I guess we should bump the k8s provider version in these providers?

Do not cache kubeconfig for exec based auth in AsyncKubernetesHook

19661ff

amoghrajesh requested review from hussein-awala, jedcunningham and jscheffl as code owners April 14, 2026 12:17

boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Apr 14, 2026

amoghrajesh requested review from phanikumv, potiuk and shahar1 April 14, 2026 12:18

amoghrajesh self-assigned this Apr 14, 2026

amoghrajesh requested a review from eladkal April 14, 2026 12:20

amoghrajesh mentioned this pull request Apr 14, 2026

Avoid caching kubeconfig for exec-based auth in AsyncKubernetesHook #61738

Draft

shahar1 approved these changes Apr 14, 2026

View reviewed changes

sunank200 reviewed Apr 14, 2026

View reviewed changes

jscheffl requested a review from Copilot April 14, 2026 18:39

Copilot started reviewing on behalf of jscheffl April 14, 2026 18:39 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

handling comments from copilot and ankit

05d6802

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not cache kubeconfig for exec based auth in AsyncKubernetesHook#65212

Do not cache kubeconfig for exec based auth in AsyncKubernetesHook#65212
amoghrajesh wants to merge 2 commits intoapache:mainfrom
astronomer:pr-61738-review

amoghrajesh commented Apr 14, 2026

Uh oh!

amoghrajesh commented Apr 14, 2026

Uh oh!

shahar1 left a comment

Uh oh!

sunank200 left a comment

Uh oh!

Uh oh!

sunank200 Apr 14, 2026

Uh oh!

amoghrajesh Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eladkal commented Apr 14, 2026 •

edited

Loading

Uh oh!

amoghrajesh commented Apr 15, 2026

Uh oh!

eladkal commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

amoghrajesh commented Apr 14, 2026

Was generative AI tooling used to co-author this PR?

Uh oh!

amoghrajesh commented Apr 14, 2026

Uh oh!

shahar1 left a comment

Choose a reason for hiding this comment

Uh oh!

sunank200 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sunank200 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

amoghrajesh Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eladkal commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amoghrajesh commented Apr 15, 2026

Uh oh!

eladkal commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eladkal commented Apr 14, 2026 •

edited

Loading