-
Notifications
You must be signed in to change notification settings - Fork 132
Fix batch atomicity when condensation forgets ObservationEvents #1450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9fc5a33 to
89b0097
Compare
When condensation forgets an ObservationEvent, the corresponding ActionEvent is correctly filtered out by filter_unmatched_tool_calls. However, other ActionEvents in the same batch (same llm_response_id) were NOT filtered out, which breaks the Anthropic API requirement that tool_use blocks must have corresponding tool_result blocks. This fix refactors the batch atomicity logic: 1. Extract common batch-building logic into _build_action_batches helper 2. Reuse _enforce_batch_atomicity in filter_unmatched_tool_calls to ensure batch atomicity is enforced after filtering unmatched tool calls 3. Also remove ObservationEvents whose ActionEvents were removed due to batch atomicity Also updated existing tests to properly mock the id and llm_response_id attributes on ActionEvent mocks. Fixes #1449 Co-authored-by: openhands <[email protected]>
89b0097 to
4554540
Compare
|
Hey @xingyaoww, this might be fine but do you think you could take a look at #1412 instead? It builds off of #1380, which also needs a look. ETA: I don't think our fixes are incompatible, but the first linked PR sets out an approach that'll be super helpful for future condenser work as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this!
You know, I read the description and it made sense; I started reading the code and it made sense; then I went back to the error message and... maybe it doesn't actually explain it?
messages.28: tool_use ids were found without tool_result blocks immediately after: toolu_01L5zJ74i3tPdZDVGoMzeMHm.
Each tool_use block must have a corresponding tool_result block in the next message.
If we sent (action2, obs2), action2 has a tool result, it's obs2 🤔
It's also curious that it removed the obs, but it had the action... which means the action must have been in keep_first events, those we don't touch afaik. Possible!
💭 Could it be that it inserts the summary between them? This code:
summary_offset=self.keep_first,
|
Oh, right, |
|
@csmith49 i do like your solution better! sorry for not checking that PR out early! I'll close this one |
This commit migrates tests from PR #1450 and adds the necessary logic to enforce batch atomicity when ObservationEvents are forgotten during condensation. Changes: - Add _build_action_batches helper method to extract batch information - Update _enforce_batch_atomicity to accept warn_on_violation parameter - Update filter_unmatched_tool_calls to enforce batch atomicity after filtering unmatched tool calls - Add test_view_condensation_batch_atomicity.py with tests for: - Batch atomicity when observation is forgotten - Batch atomicity when multiple observations are forgotten - Different batches remain independent - Single action batch observation forgotten - Update existing tests to include llm_response_id and id attributes on mock ActionEvents Co-authored-by: openhands <[email protected]>
|
ok after some discussion, @csmith49 think maybe we can merge this first & do a patch release so we can fix the CLI first before we merge his PR for a more fundamental fix |
|
|
||
| This prevents partial batches from being sent to the LLM, which can cause | ||
| API errors when thinking blocks are separated from their tool calls. | ||
| ) -> tuple[dict[str, list[str]], dict[str, str], dict[str, ToolCallID]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return type is pretty scary. Not necessary for this quick fix PR (just making a note for later), but we should either split this into three separate functions or build a data class that has each element of the tuple as a field and return that so callers can access the mappings by name.
| # Second pass: enforce batch atomicity for ActionEvents | ||
| # If any ActionEvent in a batch is removed, all ActionEvents in that | ||
| # batch should also be removed | ||
| removed_event_ids = View._enforce_batch_atomicity(events, removed_event_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call means the normal View.from_events function will call _enforce_batch_atomicity twice -- that's probably fine? Or maybe an indication that these functions need to be called after any change to the events.
Not a problem now, just leaving a note for the future integration work.
csmith49
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo the typing downgrade from EventID -> str
|
@OpenHands please address all the code review comments from @csmith49 |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Address code review feedback from @csmith49 to use the proper EventID type alias instead of raw str for event IDs and llm_response_id in the _build_action_batches method. Co-authored-by: openhands <[email protected]>
|
I've addressed the code review comment from @csmith49 and pushed the changes to the PR. Summary of ChangesCode Review Comment Addressed:
Changes Made:
Verification:
Note: The other two comments from @csmith49 were explicitly noted as "not necessary for this quick fix PR" and "not a problem now, just leaving a note for the future integration work" - so no action was required for those. The changes have been pushed to the |
Summary
This PR fixes a bug where condensation breaks action-observation pairs when an
ObservationEventis forgotten, causing the Anthropic API to return an error.Problem
When condensation forgets an
ObservationEvent, the correspondingActionEventis correctly filtered out byfilter_unmatched_tool_calls. However, otherActionEvents in the same batch (samellm_response_id) were NOT filtered out, which breaks the Anthropic API requirement thattool_useblocks must have correspondingtool_resultblocks.Error Message
Reproduction Scenario
llm_response_id):Action1(batch A) withtool_call_id="tool_call_1"Action2(batch A) withtool_call_id="tool_call_2"Obs1withtool_call_id="tool_call_1"Obs2withtool_call_id="tool_call_2"Obs1(but notAction1,Action2, orObs2)Action1is filtered out (no matchingObs1)Action2is kept (has matchingObs2)Obs2is kept (has matchingAction2)[Action2, Obs2]But
Action1andAction2were originally in the same LLM response! WhenAction2is sent to the LLM withoutAction1, it breaks the batch structure.Solution
This fix adds a second pass in
filter_unmatched_tool_callsto enforce batch atomicity after filtering. If anyActionEventin a batch is filtered out (due to missingObservationEvent), allActionEvents in that batch are also filtered out, along with their correspondingObservationEvents.Changes
filter_unmatched_tool_callsto call a new_enforce_batch_atomicity_on_filteredmethod after the initial filtering_enforce_batch_atomicity_on_filteredmethod that:llm_response_id-> list ofActionEventIDsActionEvents were filtered outActionEvents in those batches and their correspondingObservationEventsTesting
Added 4 new test cases in
test_view_condensation_batch_atomicity.py:test_batch_atomicity_when_observation_forgotten: Tests the main bug scenariotest_batch_atomicity_when_multiple_observations_forgotten: Tests with multiple forgotten observationstest_batch_atomicity_different_batches_independent: Tests that different batches are independenttest_single_action_batch_observation_forgotten: Tests single-action batchesAll existing tests continue to pass (1419 tests).
Fixes #1449
@xingyaoww can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:5f33286-pythonRun
All tags pushed for this build
About Multi-Architecture Support
5f33286-python) is a multi-arch manifest supporting both amd64 and arm645f33286-python-amd64) are also available if needed