-
Notifications
You must be signed in to change notification settings - Fork 132
fix(condenser): Tool-loop aware condensation #1508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Coverage Report •
|
||||||||||||||||||||
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 96.1% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_gpt_5.1_codex_max
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
Failed Tests:
|
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 96.1% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_gpt_5.1_codex_max
Failed Tests:
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
|
|
Note that this change requires a minor update to the condensation integration test. With the default Now, with |
…er 1 condensation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Just to note, for fun and further thinking 😅. From the logs for codex:
Ahh. “each with own preamble per system rules”. I did think we can probably fix at least this bit: |
🧪 Integration Tests ResultsOverall Success Rate: 96.1% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_gpt_5.1_codex_max
Failed Tests:
litellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_claude_sonnet_4_5_20250929
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 98.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_gpt_5.1_codex_max
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
|
|
@enyst ha! your fixes did work! |
|
Evaluation Triggered
|

This PR is an attempt to solve an issue whereby the condenser breaks some API guarantees related to thinking blocks (#1438 and others). In extended thinking mode, Anthropic APIs expect the final assistant message to start with a thinking block, and will respond with a 400 error if that is not the case.
...except, that's not actually what Anthropic APIs expect. We regularly construct message lists we send to the LLM that end in assistant messages without a thinking block with no issue. Instead, Anthropic is relying on the concept of a tool-loop (see their docs here) to define the LLM's "turn", and will complain if we send a turn that doesn't start with a thinking block.
This PR addresses this issue by modifying the manipulation index calculations in the
Viewclass to ensure that the condenser does not split up tool loops. This may not be an ideal solution -- the manipulation indices define a sort of "grid" the condenser can snap to, and making them tool-loop-aware is a significant degradation of the resolution of that grid. Instead of just splitting on action/observation pairs, or batches thereof in parallel tool-calling mode, now we have long sequences that must be treated as an atomic unit.A better solution would keep the resolution of the manipulation grid as small as possible while still ensuring thinking blocks are preserved during turns. I don't think that's possible with the existing condensation setup, so I recommend we proceed with this PR for the moment and revisit when appropriate.
Fix #1438
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:9763ee3-pythonRun
All tags pushed for this build
About Multi-Architecture Support
9763ee3-python) is a multi-arch manifest supporting both amd64 and arm649763ee3-python-amd64) are also available if needed