Skip to content

Conversation

@javanlacerda
Copy link
Contributor

No description provided.

This commit introduces the Kubernetes job client and service, analogous to the existing GCP Batch client and service.

Changes include:
- Added 'kubernetes' dependency to .
- Implemented  in  to interact with the Kubernetes API for job creation and management, supporting both raw Kubernetes jobs and Kata Containers.
- Created  to encapsulate the job creation logic.
- Created  as a public entrypoint for creating Kubernetes jobs.
- Added a  file for the new Kubernetes platform directory to manage dependencies.
This commit adds an end-to-end test for the Kubernetes service, which verifies the job creation process using a real  cluster.

The test:
- Creates a  cluster in .
- Tears down the  cluster in .
- Uses a real Docker image to create a Kubernetes job.
- Verifies that the job is created and runs to completion.

This commit also includes a fix for a  in the  that occurred when a job spec did not contain an SHELL=/bin/bash
LSCOLORS=Gxfxcxdxbxegedabagacad
NVM_RC_VERSION=
COLORTERM=truecolor
VSCODE_DEBUGPY_ADAPTER_ENDPOINTS=/usr/local/google/home/javanlacerda/.vscode-server/extensions/ms-python.debugpy-2025.18.0-linux-x64/.noConfigDebugAdapterEndpoints/endpoint-00267810175f88e9.txt
LESS=-R
TERM_PROGRAM_VERSION=1.106.0
GEMINI_CLI_SYSTEM_DEFAULTS_PATH=/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/third_party/javascript/node_modules/google_gemini_cli/system/google3-default-settings.json
GEMINI_TELEMETRY_OTLP_ENDPOINT=http://localhost:34264/logevent
_P9K_TTY=/dev/pts/7
P4CONFIG=.p4config
SSH_AUTH_SOCK=/run/user/1272835/vscode-ssh-auth-sock-424694326
P9K_TTY=old
GEMINI_API_KEY=api_proxy:shared-g3-gemini-quota
PYDEVD_DISABLE_FILE_VALIDATION=1
GBASH_ROOT=/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/util/shell/gbash
X20_HOME=/google/data/rw/users/ja/javanlacerda
TOOLLOG_TARGET=//third_party/javascript/node_modules/google_gemini_cli:gemini_impl
PWD=/usr/local/google/home/javanlacerda/repos/clusterfuzz
RSYNC_RSH=ssh
LOGNAME=javanlacerda
XDG_SESSION_TYPE=tty
BUNDLED_DEBUGPY_PATH=/usr/local/google/home/javanlacerda/.vscode-server/extensions/ms-python.debugpy-2025.18.0-linux-x64/bundled/libs/debugpy
VSCODE_GIT_ASKPASS_NODE=/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/node
INVOKER_INFO_SESSION_ID=55f0c268-dec8-4215-ad35-96b1398c8e67
P4MERGE=/google/src/files/head/depot/eng/perforce/mergep4.tcl
MOTD_SHOWN=pam
VSCODE_INJECTION=1
HOME=/usr/local/google/home/javanlacerda
LANG=en_US.UTF-8
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.7z=01;31:*.ace=01;31:*.alz=01;31:*.apk=01;31:*.arc=01;31:*.arj=01;31:*.bz=01;31:*.bz2=01;31:*.cab=01;31:*.cpio=01;31:*.crate=01;31:*.deb=01;31:*.drpm=01;31:*.dwm=01;31:*.dz=01;31:*.ear=01;31:*.egg=01;31:*.esd=01;31:*.gz=01;31:*.jar=01;31:*.lha=01;31:*.lrz=01;31:*.lz=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.lzo=01;31:*.pyz=01;31:*.rar=01;31:*.rpm=01;31:*.rz=01;31:*.sar=01;31:*.swm=01;31:*.t7z=01;31:*.tar=01;31:*.taz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tgz=01;31:*.tlz=01;31:*.txz=01;31:*.tz=01;31:*.tzo=01;31:*.tzst=01;31:*.udeb=01;31:*.war=01;31:*.whl=01;31:*.wim=01;31:*.xz=01;31:*.z=01;31:*.zip=01;31:*.zoo=01;31:*.zst=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.jxl=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:*~=00;90:*#=00;90:*.bak=00;90:*.crdownload=00;90:*.dpkg-dist=00;90:*.dpkg-new=00;90:*.dpkg-old=00;90:*.dpkg-tmp=00;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:*.swp=00;90:*.tmp=00;90:*.ucf-dist=00;90:*.ucf-new=00;90:*.ucf-old=00;90:
PYTHONSTARTUP=/usr/local/google/home/javanlacerda/.vscode-server/data/User/workspaceStorage/988189e1cebf578702970f3766af8c44-1/ms-python.python/pythonrc.py
SSL_CERT_DIR=/usr/lib/ssl/certs
[email protected] AAAAKGVjZHNhLXNoYTItbmlzdHAyNTYtY2VydC12MDFAb3BlbnNzaC5jb20AAAAgrm9zgh+Pkobp68xDmPrOVc/VkL/yWjEBXlAmaDS8aa4AAAAIbmlzdHAyNTYAAABBBGrtS4GC+U8Y+luXAmRV/2RBpslgwL8XidAlU5X8cQwolMGlTeKU5NjAwefh0HxKobikvWnTtPx5FDSgLFyYKNAZAAAAD4+zJQAAAAEAAAAcamF2YW5sYWNlcmRhQGNvcnAuZ29vZ2xlLmNvbQAAACcAAAAMamF2YW5sYWNlcmRhAAAAE2dvb2dsZVxqYXZhbmxhY2VyZGEAAAAAaUKW6AAAAABpQ7FUAAAAAAAAAMwAAAAYY2VydC1tZXRhZGF0YUBnb29nbGUuY29tAAAAKgAAACYIARIgInuLyEZpkrV9vywfzMdqPtjTcfpXwk57yFNijEqI2lUgBgAAABVwZXJtaXQtWDExLWZvcndhcmRpbmcAAAAAAAAAF3Blcm1pdC1hZ2VudC1mb3J3YXJkaW5nAAAAAAAAABZwZXJtaXQtcG9ydC1mb3J3YXJkaW5nAAAAAAAAAApwZXJtaXQtcHR5AAAAAAAAAA5wZXJtaXQtdXNlci1yYwAAAAAAAAAAAAABFwAAAAdzc2gtcnNhAAAAAwEAAQAAAQEAvN0ZS5b1OZYtoJ1PSKY4GIwjis1i4zZZ2MBdN/TEYqJIOVsfAtkDrhC9YGSVuyai/kOXwLLnFc5dVDRWHLDSBzoXEgl4QKCmNu9nneV/cMLEq4d03o1DPOSPQGJDq+wep4K9HuRwvzog6wTDA5Kp0loCnWY8MHTbt4S/O2Ro5mvF0x0ec9vccwW1KOtc/CydQiGmevBZOQOyXt8ZCZKEtSOTIPhAE55WK8agtMEsJlHRtcswSg2BJNJMSeUKgL1An/oCE9bKAME/zXVYVK5Fuv4epqccnd3sQW2T8qniOIcEDI4oybDejm6G8VPw/pxieSPbaFGftuLyR/rHS52OhwAAARQAAAAMcnNhLXNoYTItMjU2AAABAE6TtjiJyMsSn4p/V0JO9YCfcnNv+qzegA1zsjSmQsCGJvu3l7g2wXOfEe/anMJ5OZVrBf4rs+YUtpFU9fuGGRkvHLUOHzsuTepEkcXxksTbOCZtPebwXX2AzhoW/Mnt3VWzyVS9kkCEmMbKgFTi27pnZkA7/bG1icPlBBgnHNCjlcHa1TE4SoyjWD9CoXmtq9e1LtrPPKB1HuHgICeLj3R46WgAe0OESogtubsOo8wkIuOavIUgw2Qo+ypkr4jKcfQ8qagoUs2JXg6504J6ki1cU20ffcN9I3SdC6EUjryTXkej5EcK+5+x1zGmxhaWOvlGAwkT14BLWN0fE1eLHVE=

GOOGLE_GEMINI_BASE_URL=http://localhost:34264
TOOLLOG_PATH=//third_party/javascript/node_modules/google_gemini_cli:gemini
GIT_ASKPASS=/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/extensions/git/dist/askpass.sh
GEMINI_CLI_SYSTEM_SETTINGS_PATH=/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/third_party/javascript/node_modules/google_gemini_cli/system/google3-system-settings.json
STREAMZ_SERVERS=[2001:4860:f802::78]:9530
SSH_CONNECTION=172.253.31.34 42583 172.17.185.233 22
NVM_DIR=/usr/local/google/home/javanlacerda/.nvm
VSCODE_GIT_ASKPASS_EXTRA_ARGS=
GEMINI_TELEMETRY_OTLP_PROTOCOL=http
LESSCLOSE=/usr/bin/lesspipe %s %s
XDG_SESSION_CLASS=user
GEMINI_TELEMETRY_TARGET=local
PYTHONPATH=/usr/local/buildtools/current/sitecustomize
TERM=xterm-256color
ZSH=/usr/local/google/home/javanlacerda/.oh-my-zsh
PYTHON_BASIC_REPL=1
INVOKER_INFO_NAME=gemini_cli
VSCODE_NONCE=4f9f5d1d-71c5-43c5-9895-d021b6a7ce97
LESSOPEN=| /usr/bin/lesspipe %s
USER=javanlacerda
GIT_PAGER=cat
PYTHON=/usr/local/google/home/javanlacerda/.localpython/bin/python3.11
VSCODE_GIT_IPC_HANDLE=/run/user/1272835/vscode-git-08a7decbb6.sock
CHROME_REMOTE_DESKTOP_DEFAULT_DESKTOP_SIZES=1600x1200,3840x2160,3840x2560,5120x1440,2160x3840
SHLVL=6
PARINIT=rTbgqR B=.?_A_a Q=_s>|:
NVM_CD_FLAGS=
PAGER=cat
GOOGLE_AUTH_WEBAUTHN_PLUGIN=gcloudwebauthn
CVS_RSH=ssh
_P9K_SSH_TTY=/dev/pts/7
VSCODE_STABLE=1
XDG_SESSION_ID=c4901
XDG_RUNTIME_DIR=/run/user/1272835
SSL_CERT_FILE=/usr/lib/ssl/cert.pem
SSH_CLIENT=172.253.31.34 42583 22
GEMINI_CLI=1
P9K_SSH=1
INVOKER_INFO_ROOT_NAME=gemini_cli
GEMINI_CLI_NO_RELAUNCH=true
VSCODE_GIT_ASKPASS_MAIN=/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/extensions/git/dist/askpass-main.js
XDG_DATA_DIRS=/usr/share/gnome:/usr/local/share/:/usr/share/
BROWSER=/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/bin/helpers/browser.sh
PATH=/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/util/shell/gbash/v1_runtime:/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/util/shell/gbash:/usr/local/google/home/javanlacerda/google-cloud-sdk/bin:/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/bin/remote-cli:/usr/local/google/home/javanlacerda/bin:/usr/local/google/home/javanlacerda/.cargo/bin:/usr/local/google/home/javanlacerda/bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/google/home/javanlacerda/.vscode-server/extensions/ms-python.debugpy-2025.18.0-linux-x64/bundled/scripts/noConfigScripts
GOOGLE_CLOUD_DISABLE_DIRECT_PATH=true
GEMINI_TELEMETRY_ENABLED=true
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1272835/bus
INVOKER_INFO_ROOT_SESSION_ID=55f0c268-dec8-4215-ad35-96b1398c8e67
SAR_ARGV0=/google/bin/releases/gemini-cli/tools/gemini
OLDPWD=/usr/local/google/home/javanlacerda/repos/clusterfuzz
TERM_PROGRAM=vscode
VSCODE_IPC_HOOK_CLI=/run/user/1272835/vscode-ipc-04fced83-b02b-4c9d-8b9a-69d26a3da893.sock
_=/usr/bin/env section.
This commit adds unit tests for the , mocking the Kubernetes API to verify the  and  methods.

It also includes:
- Adding the  dependency to  and .
- Updating  to reflect the new dependency.
This commit updates the  package version from a wildcard () and an older version to the latest stable version (). This ensures a stable and predictable dependency.

The  has been updated accordingly.
This commit introduces a new GitHub Action workflow to run the Kubernetes end-to-end test on every pull request.

The workflow leverages a new script, , which is responsible for setting up the test environment and running the test. This follows the existing CI conventions in the project.
This commit refactors the Kubernetes service by moving  to . This change resolves a namespace collision with the  Python client library.

All import paths and Bazel build files have been updated accordingly, and tests have been verified to pass.
This commit adds the necessary  and  files for the new  internal package. These files are essential for Bazel to correctly build and manage dependencies for the Kubernetes service, which was recently moved to this new directory.
Signed-off-by: Javan Lacerda <[email protected]>
This commit consolidates the Kubernetes job creation logic by moving the contents of  into . The redundant  file has been deleted.

This simplifies the overall structure of the Kubernetes platform integration by centralizing job creation within a single service file. Corresponding  files and import statements have been updated, and  is now added to the repository.
Signed-off-by: Javan Lacerda <[email protected]>
Signed-off-by: Javan Lacerda <[email protected]>
@javanlacerda javanlacerda force-pushed the feature/kubernetes-job-client branch from 1c2cc3d to af7263c Compare December 17, 2025 20:58
Signed-off-by: Javan Lacerda <[email protected]>
This commit refactors the GCP Batch integration by merging the  logic from  directly into the  in . The now-redundant  file has been deleted.

This change simplifies the architecture by embedding client logic within the service layer, making the  a self-contained implementation of the .

Additionally, the  data structure has been removed, and all parts of the codebase now use the common  interface.
Signed-off-by: Javan Lacerda <[email protected]>
This refactor removes the  file and introduces a  that proportionally distributes tasks between  and . This allows for A/B testing and performance comparisons between the two platforms.
- Update RemoteTask interface to include create_uworker_main_batch_jobs.
- Refactor KubernetesService and GcpBatchService to match new interface.
- Fix TypeError in k8s_service_e2e_test.py by adding @classmethod to tearDownClass.
- Move and update tests for batch and k8s services.
- Fix unused argument warnings in tests.
- Fix unnecessary lambda warning.
- Update create_job call in kubernetes_test.py to include docker_image.
- Apply yapf formatting.
- Introduce KubernetesJobConfig to encapsulate job configuration.
- Update create_job to accept KubernetesJobConfig.
- Refactor create_uworker_main_batch_jobs to use the new config.
- Update k8s_service_e2e_test.py and kubernetes_test.py to match the new API.
- Remove redundant kind installation from ci_tests.bash.
- Update KubernetesJobConfig instantiation in kubernetes_test.py and k8s_service_e2e_test.py to include 'is_kata=False'.
- Switch is_kata flag to True in kubernetes_test.py and k8s_service_e2e_test.py to verify Kata container job creation path.
- Fix e2e tests to use is_kata=False for standard jobs, as the test environment (Kind) may not support Kata containers.
- Update KubernetesService to support GKE credential loading and Kata containers.
- Implement task splitting logic in RemoteTaskGate based on job frequency.
- Update RemoteTaskGateTest to verify task routing and slicing.
- Update run_remote_task.py to use RemoteTaskGate.
- Update UTASK_MAIN_QUEUE for testing purposes.
- Refactor run_bot.py to use RemoteTaskGate instead of GcpBatchService for task scheduling.
- Revert default job frequency to 0% Kubernetes in job_frequency.py.

Signed-off-by: Javan Lacerda <[email protected]>
@javanlacerda javanlacerda force-pushed the feature/kubernetes-job-client branch 2 times, most recently from efb87ef to 47851b8 Compare December 23, 2025 15:07
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set.
- K8s Tests: Update unit tests to verify spec generation and mock correctly. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set.
- Local Tests: Update kubernetes e2e test script with correct filename.
- Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError.
@javanlacerda javanlacerda force-pushed the feature/kubernetes-job-client branch 2 times, most recently from 28146af to a98e940 Compare December 23, 2025 17:30
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set.
- K8s Tests: Update unit tests to verify spec generation and mock correctly. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set.
- Local Tests: Update kubernetes e2e test script with correct filename.
- Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError.
- Deps: Add google-api-python-client, aiohttp, and google-cloud-storage to root Pipfile.
@javanlacerda javanlacerda force-pushed the feature/kubernetes-job-client branch 2 times, most recently from 55a574b to 8a7cefc Compare December 23, 2025 18:37
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set.
- K8s Tests: Update unit tests to verify spec generation and mock correctly. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set.
- Local Tests: Update kubernetes e2e test script with correct filename.
- Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError.
- Deps: Add necessary google-cloud and http libs to root Pipfile for e2e tests.
@javanlacerda javanlacerda force-pushed the feature/kubernetes-job-client branch from 8a7cefc to 7b08cbf Compare December 23, 2025 18:45
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set.
- K8s Tests: Update unit tests to verify spec generation and mock correctly. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set.
- Local Tests: Update kubernetes e2e test script with correct filename.
- Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError.
- Deps: Add necessary google-cloud and http libs to root Pipfile for e2e tests.
- CI: Install JDK 21 in kubernetes-e2e-tests workflow for Datastore emulator.
@javanlacerda javanlacerda force-pushed the feature/kubernetes-job-client branch from 8073263 to f40cbdb Compare December 23, 2025 19:14
Signed-off-by: Javan Lacerda <[email protected]>
@javanlacerda javanlacerda force-pushed the feature/kubernetes-job-client branch 3 times, most recently from 202a012 to 036b6bb Compare December 23, 2025 22:55
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set.
- K8s Tests: Update unit tests to verify spec generation and mock correctly. Patch _load_gke_credentials in unit tests to avoid default credential errors. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set.
- Local Tests: Update kubernetes e2e test script with correct filename.
- Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError.
- Deps: Add google-api-python-client, aiohttp, and google-cloud libs to root Pipfile for e2e tests.
@javanlacerda javanlacerda force-pushed the feature/kubernetes-job-client branch from bc1c5fb to 2bb20c8 Compare December 24, 2025 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant