ci(integration): update integration LLM matrix to gpt-5.2-codex #1503

enyst · 2025-12-24T05:30:55Z

Summary

Replace GPT-5.1 Codex Max with GPT-5.2 Codex in .github/workflows/integration-runner.yml so integration tests exercise the latest Codex variant.

Verification

Ran the examples/01_standalone_sdk/01_hello_world.py locally pointing to the eval proxy with model litellm_proxy/gpt-5.2-codex. The run reached the proxy and attempted to use the model, confirming model id resolves at the proxy; however, access to gpt-5.2-codex may depend on environment credentials. In CI, the integration workflow uses the managed LLM proxy + key.

Notes

No other tests modified. This PR only updates the workflow matrix entry.

Co-authored-by: openhands [email protected]

@enyst can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:7743988-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-7743988-python \
  ghcr.io/openhands/agent-server:7743988-python

All tags pushed for this build

ghcr.io/openhands/agent-server:7743988-golang-amd64
ghcr.io/openhands/agent-server:7743988-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:7743988-golang-arm64
ghcr.io/openhands/agent-server:7743988-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:7743988-java-amd64
ghcr.io/openhands/agent-server:7743988-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:7743988-java-arm64
ghcr.io/openhands/agent-server:7743988-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:7743988-python-amd64
ghcr.io/openhands/agent-server:7743988-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:7743988-python-arm64
ghcr.io/openhands/agent-server:7743988-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:7743988-golang
ghcr.io/openhands/agent-server:7743988-java
ghcr.io/openhands/agent-server:7743988-python

About Multi-Architecture Support

Each variant tag (e.g., 7743988-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 7743988-python-amd64) are also available if needed

…place GPT-5.1 Codex Max with GPT-5.2 Codex in integration-runner.yml matrix so CI exercises the latest codex family.\n\nCo-authored-by: openhands <[email protected]>

all-hands-bot

Thanks!

xingyaoww

Thanks!! (I was messing around the bot account and forget to log out 😓 )

github-actions · 2025-12-28T19:00:38Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-12-28T19:10:23Z

🧪 Integration Tests Results

Overall Success Rate: 94.1%
Total Cost: $1.76
Models Tested: 6
Timestamp: 2025-12-28 19:10:17 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs

📊 Summary

Model	Overall	Integration (Required)	Behavior (Optional)	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_mistral_devstral_2512	87.5%	87.5%	N/A	7/8	1	9	$0.16	378,358
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	100.0%	N/A	8/8	1	9	$0.36	564,891
litellm_proxy_deepseek_deepseek_chat	100.0%	100.0%	N/A	8/8	1	9	$0.07	702,676
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	100.0%	N/A	9/9	0	9	$0.46	268,469
litellm_proxy_gpt_5.1_codex_max	77.8%	77.8%	N/A	7/9	0	9	$0.19	208,981
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	100.0%	N/A	9/9	0	9	$0.51	341,318

📋 Detailed Results

litellm_proxy_mistral_devstral_2512

Overall Success Rate: 87.5% (7/8)
Integration Tests (Required): 87.5% (7/9)
Total Cost: $0.16
Token Usage: prompt: 374,809, completion: 3,549
Run Suffix: litellm_proxy_mistral_devstral_2512_d6e2cc1_devstral_2512_run_N9_20251228_190101
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0084)

litellm_proxy_moonshot_kimi_k2_thinking

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.36
Token Usage: prompt: 551,324, completion: 13,567, cache_read: 485,120
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_d6e2cc1_kimi_k2_run_N9_20251228_190100
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_deepseek_deepseek_chat

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.07
Token Usage: prompt: 689,669, completion: 13,007, cache_read: 633,408
Run Suffix: litellm_proxy_deepseek_deepseek_chat_d6e2cc1_deepseek_run_N9_20251228_190101
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_claude_sonnet_4_5_20250929

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.46
Token Usage: prompt: 259,016, completion: 9,453, cache_read: 188,035, cache_write: 70,503, reasoning: 2,472
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_d6e2cc1_sonnet_run_N9_20251228_190100

litellm_proxy_gpt_5.1_codex_max

Overall Success Rate: 77.8% (7/9)
Integration Tests (Required): 77.8% (7/9)
Total Cost: $0.19
Token Usage: prompt: 203,023, completion: 5,958, cache_read: 108,672, reasoning: 3,904
Run Suffix: litellm_proxy_gpt_5.1_codex_max_d6e2cc1_gpt51_codex_run_N9_20251228_190101

Failed Tests:

t09_token_condenser ⚠️ REQUIRED: Condensation not triggered. Token counting may not work. (Cost: $0.0059)
t06_github_pr_browsing ⚠️ REQUIRED: Agent's final answer does not contain the expected information about the PR content. Final answer preview: I don’t have network access here, so I can’t open that GitHub PR. If you paste the PR description or comments (especially @asadm’s), I can summarize and explain what’s happening.... (Cost: $0.01)

litellm_proxy_vertex_ai_gemini_3_pro_preview

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.51
Token Usage: prompt: 323,623, completion: 17,695, cache_read: 196,601, reasoning: 12,394
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_d6e2cc1_gemini_3_pro_run_N9_20251228_190101

xingyaoww · 2025-12-28T19:25:26Z

We probably need to merge it to test it

…ex (#1503)" This reverts commit 0d18f79.

ci(integration): update integration LLM matrix to gpt-5.2-codex\n\nRe…

d6e2cc1

…place GPT-5.1 Codex Max with GPT-5.2 Codex in integration-runner.yml matrix so CI exercises the latest codex family.\n\nCo-authored-by: openhands <[email protected]>

enyst added ci integration labels Dec 24, 2025 — with OpenHands AI

openhands-ai bot mentioned this pull request Dec 24, 2025

feat(condenser): Token-aware condensation in LLMSummarizingCondenser #1380

Merged

all-hands-bot approved these changes Dec 28, 2025

View reviewed changes

xingyaoww approved these changes Dec 28, 2025

View reviewed changes

xingyaoww added the integration-test Runs the integration tests and comments the results label Dec 28, 2025

xingyaoww merged commit 0d18f79 into main Dec 28, 2025
53 checks passed

xingyaoww deleted the chore/update-integration-codex-5_2 branch December 28, 2025 19:25

xingyaoww added a commit that referenced this pull request Dec 28, 2025

Revert "ci(integration): update integration LLM matrix to gpt-5.2-cod…

2ae239a

…ex (#1503)" This reverts commit 0d18f79.

xingyaoww mentioned this pull request Dec 28, 2025

Revert "ci(integration): update integration LLM matrix to gpt-5.2-codex" #1524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(integration): update integration LLM matrix to gpt-5.2-codex #1503

ci(integration): update integration LLM matrix to gpt-5.2-codex #1503

Uh oh!

enyst commented Dec 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

xingyaoww left a comment

Uh oh!

github-actions bot commented Dec 28, 2025

Uh oh!

github-actions bot commented Dec 28, 2025

Uh oh!

xingyaoww commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ci(integration): update integration LLM matrix to gpt-5.2-codex #1503

ci(integration): update integration LLM matrix to gpt-5.2-codex #1503

Uh oh!

Conversation

enyst commented Dec 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

xingyaoww left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 28, 2025

Uh oh!

github-actions bot commented Dec 28, 2025

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_mistral_devstral_2512

litellm_proxy_moonshot_kimi_k2_thinking

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_gpt_5.1_codex_max

litellm_proxy_vertex_ai_gemini_3_pro_preview

Uh oh!

xingyaoww commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

enyst commented Dec 24, 2025 •

edited by github-actions bot

Loading