-
Notifications
You must be signed in to change notification settings - Fork 594
Feature/kubernetes job client #5095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
javanlacerda
wants to merge
40
commits into
master
Choose a base branch
from
feature/kubernetes-job-client
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces the Kubernetes job client and service, analogous to the existing GCP Batch client and service. Changes include: - Added 'kubernetes' dependency to . - Implemented in to interact with the Kubernetes API for job creation and management, supporting both raw Kubernetes jobs and Kata Containers. - Created to encapsulate the job creation logic. - Created as a public entrypoint for creating Kubernetes jobs. - Added a file for the new Kubernetes platform directory to manage dependencies.
This commit adds an end-to-end test for the Kubernetes service, which verifies the job creation process using a real cluster. The test: - Creates a cluster in . - Tears down the cluster in . - Uses a real Docker image to create a Kubernetes job. - Verifies that the job is created and runs to completion. This commit also includes a fix for a in the that occurred when a job spec did not contain an SHELL=/bin/bash LSCOLORS=Gxfxcxdxbxegedabagacad NVM_RC_VERSION= COLORTERM=truecolor VSCODE_DEBUGPY_ADAPTER_ENDPOINTS=/usr/local/google/home/javanlacerda/.vscode-server/extensions/ms-python.debugpy-2025.18.0-linux-x64/.noConfigDebugAdapterEndpoints/endpoint-00267810175f88e9.txt LESS=-R TERM_PROGRAM_VERSION=1.106.0 GEMINI_CLI_SYSTEM_DEFAULTS_PATH=/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/third_party/javascript/node_modules/google_gemini_cli/system/google3-default-settings.json GEMINI_TELEMETRY_OTLP_ENDPOINT=http://localhost:34264/logevent _P9K_TTY=/dev/pts/7 P4CONFIG=.p4config SSH_AUTH_SOCK=/run/user/1272835/vscode-ssh-auth-sock-424694326 P9K_TTY=old GEMINI_API_KEY=api_proxy:shared-g3-gemini-quota PYDEVD_DISABLE_FILE_VALIDATION=1 GBASH_ROOT=/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/util/shell/gbash X20_HOME=/google/data/rw/users/ja/javanlacerda TOOLLOG_TARGET=//third_party/javascript/node_modules/google_gemini_cli:gemini_impl PWD=/usr/local/google/home/javanlacerda/repos/clusterfuzz RSYNC_RSH=ssh LOGNAME=javanlacerda XDG_SESSION_TYPE=tty BUNDLED_DEBUGPY_PATH=/usr/local/google/home/javanlacerda/.vscode-server/extensions/ms-python.debugpy-2025.18.0-linux-x64/bundled/libs/debugpy VSCODE_GIT_ASKPASS_NODE=/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/node INVOKER_INFO_SESSION_ID=55f0c268-dec8-4215-ad35-96b1398c8e67 P4MERGE=/google/src/files/head/depot/eng/perforce/mergep4.tcl MOTD_SHOWN=pam VSCODE_INJECTION=1 HOME=/usr/local/google/home/javanlacerda LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.7z=01;31:*.ace=01;31:*.alz=01;31:*.apk=01;31:*.arc=01;31:*.arj=01;31:*.bz=01;31:*.bz2=01;31:*.cab=01;31:*.cpio=01;31:*.crate=01;31:*.deb=01;31:*.drpm=01;31:*.dwm=01;31:*.dz=01;31:*.ear=01;31:*.egg=01;31:*.esd=01;31:*.gz=01;31:*.jar=01;31:*.lha=01;31:*.lrz=01;31:*.lz=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.lzo=01;31:*.pyz=01;31:*.rar=01;31:*.rpm=01;31:*.rz=01;31:*.sar=01;31:*.swm=01;31:*.t7z=01;31:*.tar=01;31:*.taz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tgz=01;31:*.tlz=01;31:*.txz=01;31:*.tz=01;31:*.tzo=01;31:*.tzst=01;31:*.udeb=01;31:*.war=01;31:*.whl=01;31:*.wim=01;31:*.xz=01;31:*.z=01;31:*.zip=01;31:*.zoo=01;31:*.zst=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.jxl=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:*~=00;90:*#=00;90:*.bak=00;90:*.crdownload=00;90:*.dpkg-dist=00;90:*.dpkg-new=00;90:*.dpkg-old=00;90:*.dpkg-tmp=00;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:*.swp=00;90:*.tmp=00;90:*.ucf-dist=00;90:*.ucf-new=00;90:*.ucf-old=00;90: PYTHONSTARTUP=/usr/local/google/home/javanlacerda/.vscode-server/data/User/workspaceStorage/988189e1cebf578702970f3766af8c44-1/ms-python.python/pythonrc.py SSL_CERT_DIR=/usr/lib/ssl/certs [email protected] AAAAKGVjZHNhLXNoYTItbmlzdHAyNTYtY2VydC12MDFAb3BlbnNzaC5jb20AAAAgrm9zgh+Pkobp68xDmPrOVc/VkL/yWjEBXlAmaDS8aa4AAAAIbmlzdHAyNTYAAABBBGrtS4GC+U8Y+luXAmRV/2RBpslgwL8XidAlU5X8cQwolMGlTeKU5NjAwefh0HxKobikvWnTtPx5FDSgLFyYKNAZAAAAD4+zJQAAAAEAAAAcamF2YW5sYWNlcmRhQGNvcnAuZ29vZ2xlLmNvbQAAACcAAAAMamF2YW5sYWNlcmRhAAAAE2dvb2dsZVxqYXZhbmxhY2VyZGEAAAAAaUKW6AAAAABpQ7FUAAAAAAAAAMwAAAAYY2VydC1tZXRhZGF0YUBnb29nbGUuY29tAAAAKgAAACYIARIgInuLyEZpkrV9vywfzMdqPtjTcfpXwk57yFNijEqI2lUgBgAAABVwZXJtaXQtWDExLWZvcndhcmRpbmcAAAAAAAAAF3Blcm1pdC1hZ2VudC1mb3J3YXJkaW5nAAAAAAAAABZwZXJtaXQtcG9ydC1mb3J3YXJkaW5nAAAAAAAAAApwZXJtaXQtcHR5AAAAAAAAAA5wZXJtaXQtdXNlci1yYwAAAAAAAAAAAAABFwAAAAdzc2gtcnNhAAAAAwEAAQAAAQEAvN0ZS5b1OZYtoJ1PSKY4GIwjis1i4zZZ2MBdN/TEYqJIOVsfAtkDrhC9YGSVuyai/kOXwLLnFc5dVDRWHLDSBzoXEgl4QKCmNu9nneV/cMLEq4d03o1DPOSPQGJDq+wep4K9HuRwvzog6wTDA5Kp0loCnWY8MHTbt4S/O2Ro5mvF0x0ec9vccwW1KOtc/CydQiGmevBZOQOyXt8ZCZKEtSOTIPhAE55WK8agtMEsJlHRtcswSg2BJNJMSeUKgL1An/oCE9bKAME/zXVYVK5Fuv4epqccnd3sQW2T8qniOIcEDI4oybDejm6G8VPw/pxieSPbaFGftuLyR/rHS52OhwAAARQAAAAMcnNhLXNoYTItMjU2AAABAE6TtjiJyMsSn4p/V0JO9YCfcnNv+qzegA1zsjSmQsCGJvu3l7g2wXOfEe/anMJ5OZVrBf4rs+YUtpFU9fuGGRkvHLUOHzsuTepEkcXxksTbOCZtPebwXX2AzhoW/Mnt3VWzyVS9kkCEmMbKgFTi27pnZkA7/bG1icPlBBgnHNCjlcHa1TE4SoyjWD9CoXmtq9e1LtrPPKB1HuHgICeLj3R46WgAe0OESogtubsOo8wkIuOavIUgw2Qo+ypkr4jKcfQ8qagoUs2JXg6504J6ki1cU20ffcN9I3SdC6EUjryTXkej5EcK+5+x1zGmxhaWOvlGAwkT14BLWN0fE1eLHVE= GOOGLE_GEMINI_BASE_URL=http://localhost:34264 TOOLLOG_PATH=//third_party/javascript/node_modules/google_gemini_cli:gemini GIT_ASKPASS=/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/extensions/git/dist/askpass.sh GEMINI_CLI_SYSTEM_SETTINGS_PATH=/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/third_party/javascript/node_modules/google_gemini_cli/system/google3-system-settings.json STREAMZ_SERVERS=[2001:4860:f802::78]:9530 SSH_CONNECTION=172.253.31.34 42583 172.17.185.233 22 NVM_DIR=/usr/local/google/home/javanlacerda/.nvm VSCODE_GIT_ASKPASS_EXTRA_ARGS= GEMINI_TELEMETRY_OTLP_PROTOCOL=http LESSCLOSE=/usr/bin/lesspipe %s %s XDG_SESSION_CLASS=user GEMINI_TELEMETRY_TARGET=local PYTHONPATH=/usr/local/buildtools/current/sitecustomize TERM=xterm-256color ZSH=/usr/local/google/home/javanlacerda/.oh-my-zsh PYTHON_BASIC_REPL=1 INVOKER_INFO_NAME=gemini_cli VSCODE_NONCE=4f9f5d1d-71c5-43c5-9895-d021b6a7ce97 LESSOPEN=| /usr/bin/lesspipe %s USER=javanlacerda GIT_PAGER=cat PYTHON=/usr/local/google/home/javanlacerda/.localpython/bin/python3.11 VSCODE_GIT_IPC_HANDLE=/run/user/1272835/vscode-git-08a7decbb6.sock CHROME_REMOTE_DESKTOP_DEFAULT_DESKTOP_SIZES=1600x1200,3840x2160,3840x2560,5120x1440,2160x3840 SHLVL=6 PARINIT=rTbgqR B=.?_A_a Q=_s>|: NVM_CD_FLAGS= PAGER=cat GOOGLE_AUTH_WEBAUTHN_PLUGIN=gcloudwebauthn CVS_RSH=ssh _P9K_SSH_TTY=/dev/pts/7 VSCODE_STABLE=1 XDG_SESSION_ID=c4901 XDG_RUNTIME_DIR=/run/user/1272835 SSL_CERT_FILE=/usr/lib/ssl/cert.pem SSH_CLIENT=172.253.31.34 42583 22 GEMINI_CLI=1 P9K_SSH=1 INVOKER_INFO_ROOT_NAME=gemini_cli GEMINI_CLI_NO_RELAUNCH=true VSCODE_GIT_ASKPASS_MAIN=/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/extensions/git/dist/askpass-main.js XDG_DATA_DIRS=/usr/share/gnome:/usr/local/share/:/usr/share/ BROWSER=/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/bin/helpers/browser.sh PATH=/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/util/shell/gbash/v1_runtime:/tmp/sar.gemini.1272835.e68c6c7952b9c635caa61479c91e007736d9357a/gemini_impl.runfiles/google3/util/shell/gbash:/usr/local/google/home/javanlacerda/google-cloud-sdk/bin:/usr/local/google/home/javanlacerda/.vscode-server/cli/servers/Stable-ac4cbdf48759c7d8c3eb91ffe6bb04316e263c57/server/bin/remote-cli:/usr/local/google/home/javanlacerda/bin:/usr/local/google/home/javanlacerda/.cargo/bin:/usr/local/google/home/javanlacerda/bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/google/home/javanlacerda/.vscode-server/extensions/ms-python.debugpy-2025.18.0-linux-x64/bundled/scripts/noConfigScripts GOOGLE_CLOUD_DISABLE_DIRECT_PATH=true GEMINI_TELEMETRY_ENABLED=true DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1272835/bus INVOKER_INFO_ROOT_SESSION_ID=55f0c268-dec8-4215-ad35-96b1398c8e67 SAR_ARGV0=/google/bin/releases/gemini-cli/tools/gemini OLDPWD=/usr/local/google/home/javanlacerda/repos/clusterfuzz TERM_PROGRAM=vscode VSCODE_IPC_HOOK_CLI=/run/user/1272835/vscode-ipc-04fced83-b02b-4c9d-8b9a-69d26a3da893.sock _=/usr/bin/env section.
This commit adds unit tests for the , mocking the Kubernetes API to verify the and methods. It also includes: - Adding the dependency to and . - Updating to reflect the new dependency.
This commit updates the package version from a wildcard () and an older version to the latest stable version (). This ensures a stable and predictable dependency. The has been updated accordingly.
This commit introduces a new GitHub Action workflow to run the Kubernetes end-to-end test on every pull request. The workflow leverages a new script, , which is responsible for setting up the test environment and running the test. This follows the existing CI conventions in the project.
This commit refactors the Kubernetes service by moving to . This change resolves a namespace collision with the Python client library. All import paths and Bazel build files have been updated accordingly, and tests have been verified to pass.
This commit adds the necessary and files for the new internal package. These files are essential for Bazel to correctly build and manage dependencies for the Kubernetes service, which was recently moved to this new directory.
Signed-off-by: Javan Lacerda <[email protected]>
This commit consolidates the Kubernetes job creation logic by moving the contents of into . The redundant file has been deleted. This simplifies the overall structure of the Kubernetes platform integration by centralizing job creation within a single service file. Corresponding files and import statements have been updated, and is now added to the repository.
Signed-off-by: Javan Lacerda <[email protected]>
Signed-off-by: Javan Lacerda <[email protected]>
1c2cc3d to
af7263c
Compare
Signed-off-by: Javan Lacerda <[email protected]>
Signed-off-by: Javan Lacerda <[email protected]>
This commit refactors the GCP Batch integration by merging the logic from directly into the in . The now-redundant file has been deleted. This change simplifies the architecture by embedding client logic within the service layer, making the a self-contained implementation of the . Additionally, the data structure has been removed, and all parts of the codebase now use the common interface.
Signed-off-by: Javan Lacerda <[email protected]>
This refactor removes the file and introduces a that proportionally distributes tasks between and . This allows for A/B testing and performance comparisons between the two platforms.
- Update RemoteTask interface to include create_uworker_main_batch_jobs. - Refactor KubernetesService and GcpBatchService to match new interface. - Fix TypeError in k8s_service_e2e_test.py by adding @classmethod to tearDownClass. - Move and update tests for batch and k8s services.
- Fix unused argument warnings in tests. - Fix unnecessary lambda warning. - Update create_job call in kubernetes_test.py to include docker_image. - Apply yapf formatting.
- Introduce KubernetesJobConfig to encapsulate job configuration. - Update create_job to accept KubernetesJobConfig. - Refactor create_uworker_main_batch_jobs to use the new config. - Update k8s_service_e2e_test.py and kubernetes_test.py to match the new API. - Remove redundant kind installation from ci_tests.bash.
- Update KubernetesJobConfig instantiation in kubernetes_test.py and k8s_service_e2e_test.py to include 'is_kata=False'.
- Switch is_kata flag to True in kubernetes_test.py and k8s_service_e2e_test.py to verify Kata container job creation path.
- Fix e2e tests to use is_kata=False for standard jobs, as the test environment (Kind) may not support Kata containers.
- Update KubernetesService to support GKE credential loading and Kata containers. - Implement task splitting logic in RemoteTaskGate based on job frequency. - Update RemoteTaskGateTest to verify task routing and slicing. - Update run_remote_task.py to use RemoteTaskGate. - Update UTASK_MAIN_QUEUE for testing purposes.
- Refactor run_bot.py to use RemoteTaskGate instead of GcpBatchService for task scheduling. - Revert default job frequency to 0% Kubernetes in job_frequency.py. Signed-off-by: Javan Lacerda <[email protected]>
efb87ef to
47851b8
Compare
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set. - K8s Tests: Update unit tests to verify spec generation and mock correctly. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set. - Local Tests: Update kubernetes e2e test script with correct filename. - Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError.
28146af to
a98e940
Compare
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set. - K8s Tests: Update unit tests to verify spec generation and mock correctly. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set. - Local Tests: Update kubernetes e2e test script with correct filename. - Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError. - Deps: Add google-api-python-client, aiohttp, and google-cloud-storage to root Pipfile.
55a574b to
8a7cefc
Compare
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set. - K8s Tests: Update unit tests to verify spec generation and mock correctly. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set. - Local Tests: Update kubernetes e2e test script with correct filename. - Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError. - Deps: Add necessary google-cloud and http libs to root Pipfile for e2e tests.
8a7cefc to
7b08cbf
Compare
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set. - K8s Tests: Update unit tests to verify spec generation and mock correctly. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set. - Local Tests: Update kubernetes e2e test script with correct filename. - Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError. - Deps: Add necessary google-cloud and http libs to root Pipfile for e2e tests. - CI: Install JDK 21 in kubernetes-e2e-tests workflow for Datastore emulator.
8073263 to
f40cbdb
Compare
Signed-off-by: Javan Lacerda <[email protected]>
202a012 to
036b6bb
Compare
- K8s Service: Update Kata container job spec with hostNetwork: True, HOST_UID=1337, capabilities: ALL, and standardized volume size. Skip default credential loading if K8S_E2E env var is set. - K8s Tests: Update unit tests to verify spec generation and mock correctly. Patch _load_gke_credentials in unit tests to avoid default credential errors. Update e2e tests to verify job Running status instead of completion to avoid timeouts with default command. Skip e2e test if K8S_E2E env var is not set. - Local Tests: Update kubernetes e2e test script with correct filename. - Batch Service Test: Fix mock return value to be a Job object to resolve AttributeError. - Deps: Add google-api-python-client, aiohttp, and google-cloud libs to root Pipfile for e2e tests.
bc1c5fb to
2bb20c8
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.