Do not cache kubeconfig for exec based auth in AsyncKubernetesHook#65212
Do not cache kubeconfig for exec based auth in AsyncKubernetesHook#65212amoghrajesh wants to merge 2 commits intoapache:mainfrom
Conversation
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py
Show resolved
Hide resolved
| if not self._uses_exec_auth(self.config_dict, context=cluster_context): | ||
| self._config_loaded = True | ||
|
|
||
| return |
There was a problem hiding this comment.
nit: When exec auth is detected, load_kube_config_from_dict is called on every _load_config() invocation since _config_loaded stays False. Should we consider separating config initialised from config cached to avoid redundant loads?
There was a problem hiding this comment.
Yeah actually having a new config for this might turn out cleaner. Handling that using a new private class var: _is_exec_auth, with an intent to run _uses_exec_auth once per instance.
There was a problem hiding this comment.
Pull request overview
This PR updates AsyncKubernetesHook kubeconfig loading to avoid caching configs that use exec-based authentication (e.g. EKS/GKE), preventing expired short-lived tokens from being reused across long-running/deferrable tasks.
Changes:
- Add
_uses_exec_auth()to detect exec-based auth in the active kubeconfig context user (with a conservative fallback). - Skip setting
_config_loaded(i.e., disable caching) when exec-based auth is detected; keep existing caching behavior for non-exec auth. - Add unit tests covering
_uses_exec_auth()and caching behavior for dict/string kubeconfigs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py |
Adds exec-auth detection and conditionally disables config caching to avoid reusing expired exec tokens. |
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/hooks/test_kubernetes.py |
Adds tests for exec-auth detection and verifies caching vs non-caching behavior for kubeconfig inputs. |
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py
Outdated
Show resolved
Hide resolved
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py
Show resolved
Hide resolved
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py
Outdated
Show resolved
Hide resolved
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py
Outdated
Show resolved
Hide resolved
|
Isn't that the same issue as #63610 which we concluded as won't fix since the solution is to bump botocore version #60943 (comment) ? |
|
@eladkal I wasn't aware of that one so I investigated a bit. They are actually distinct issues. #63610 was trying to fix a race condition, ie: concurrent tasks fighting over the aws cli cache dir, causing This PR is different and is a token expiry issue specific to |
Since this fix is for EKS and GKE I guess we should bump the k8s provider version in these providers? |
Was generative AI tooling used to co-author this PR?
Reattempt at: #61738
closes: #61738
Deferrable tasks on EKS/GKE can run longer than the 15-minute token lifetime issued by exec auth plugins (aws eks get-token, gke-gcloud-auth-plugin). Because
_load_config()caches the kubeconfig on the first call, subsequentcalls are no operations and the expired token gets reused, causing auth failures.
Attempting to redo the PR by addressing review comments from myself
_uses_exec_auth()to detect exec-based auth in the active context user only (not any user in the file), passing{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.