Skip to content

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Dec 18, 2025

Summary

This PR fixes a bug where condensation breaks action-observation pairs when an ObservationEvent is forgotten, causing the Anthropic API to return an error.

Problem

When condensation forgets an ObservationEvent, the corresponding ActionEvent is correctly filtered out by filter_unmatched_tool_calls. However, other ActionEvents in the same batch (same llm_response_id) were NOT filtered out, which breaks the Anthropic API requirement that tool_use blocks must have corresponding tool_result blocks.

Error Message

litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.28: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_01L5zJ74i3tPdZDVGoMzeMHm. Each `tool_use` block must have a corresponding `tool_result` block in the next message."}}

Reproduction Scenario

  1. Create a batch of 2 actions from the same LLM response (same llm_response_id):
    • Action1 (batch A) with tool_call_id="tool_call_1"
    • Action2 (batch A) with tool_call_id="tool_call_2"
  2. Create matching observations:
    • Obs1 with tool_call_id="tool_call_1"
    • Obs2 with tool_call_id="tool_call_2"
  3. Condensation forgets Obs1 (but not Action1, Action2, or Obs2)
  4. After filtering:
    • Action1 is filtered out (no matching Obs1)
    • Action2 is kept (has matching Obs2)
    • Obs2 is kept (has matching Action2)
  5. Result: [Action2, Obs2]

But Action1 and Action2 were originally in the same LLM response! When Action2 is sent to the LLM without Action1, it breaks the batch structure.

Solution

This fix adds a second pass in filter_unmatched_tool_calls to enforce batch atomicity after filtering. If any ActionEvent in a batch is filtered out (due to missing ObservationEvent), all ActionEvents in that batch are also filtered out, along with their corresponding ObservationEvents.

Changes

  1. Modified filter_unmatched_tool_calls to call a new _enforce_batch_atomicity_on_filtered method after the initial filtering
  2. Added _enforce_batch_atomicity_on_filtered method that:
    • Builds a map of llm_response_id -> list of ActionEvent IDs
    • Identifies batches where some (but not all) ActionEvents were filtered out
    • Removes all ActionEvents in those batches and their corresponding ObservationEvents

Testing

Added 4 new test cases in test_view_condensation_batch_atomicity.py:

  • test_batch_atomicity_when_observation_forgotten: Tests the main bug scenario
  • test_batch_atomicity_when_multiple_observations_forgotten: Tests with multiple forgotten observations
  • test_batch_atomicity_different_batches_independent: Tests that different batches are independent
  • test_single_action_batch_observation_forgotten: Tests single-action batches

All existing tests continue to pass (1419 tests).

Fixes #1449

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:5f33286-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-5f33286-python \
  ghcr.io/openhands/agent-server:5f33286-python

All tags pushed for this build

ghcr.io/openhands/agent-server:5f33286-golang-amd64
ghcr.io/openhands/agent-server:5f33286-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:5f33286-golang-arm64
ghcr.io/openhands/agent-server:5f33286-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:5f33286-java-amd64
ghcr.io/openhands/agent-server:5f33286-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:5f33286-java-arm64
ghcr.io/openhands/agent-server:5f33286-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:5f33286-python-amd64
ghcr.io/openhands/agent-server:5f33286-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:5f33286-python-arm64
ghcr.io/openhands/agent-server:5f33286-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:5f33286-golang
ghcr.io/openhands/agent-server:5f33286-java
ghcr.io/openhands/agent-server:5f33286-python

About Multi-Architecture Support

  • Each variant tag (e.g., 5f33286-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 5f33286-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Dec 18, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/context
   view.py1423873%41, 46, 51–52, 57–58, 63–67, 83–87, 89, 150–151, 182, 193–194, 200, 203, 254–255, 257, 259, 281–284, 287, 289–290, 297, 299–300
TOTAL12921580455% 

@xingyaoww xingyaoww force-pushed the fix/condensation-batch-atomicity branch from 9fc5a33 to 89b0097 Compare December 19, 2025 03:23
When condensation forgets an ObservationEvent, the corresponding ActionEvent
is correctly filtered out by filter_unmatched_tool_calls. However, other
ActionEvents in the same batch (same llm_response_id) were NOT filtered out,
which breaks the Anthropic API requirement that tool_use blocks must have
corresponding tool_result blocks.

This fix refactors the batch atomicity logic:
1. Extract common batch-building logic into _build_action_batches helper
2. Reuse _enforce_batch_atomicity in filter_unmatched_tool_calls to ensure
   batch atomicity is enforced after filtering unmatched tool calls
3. Also remove ObservationEvents whose ActionEvents were removed due to
   batch atomicity

Also updated existing tests to properly mock the id and llm_response_id
attributes on ActionEvent mocks.

Fixes #1449

Co-authored-by: openhands <[email protected]>
@xingyaoww xingyaoww force-pushed the fix/condensation-batch-atomicity branch from 89b0097 to 4554540 Compare December 19, 2025 03:34
@xingyaoww xingyaoww marked this pull request as ready for review December 19, 2025 03:39
@csmith49
Copy link
Collaborator

csmith49 commented Dec 19, 2025

Hey @xingyaoww, this might be fine but do you think you could take a look at #1412 instead? It builds off of #1380, which also needs a look.

ETA: I don't think our fixes are incompatible, but the first linked PR sets out an approach that'll be super helpful for future condenser work as well.

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this!

You know, I read the description and it made sense; I started reading the code and it made sense; then I went back to the error message and... maybe it doesn't actually explain it?
messages.28: tool_use ids were found without tool_result blocks immediately after: toolu_01L5zJ74i3tPdZDVGoMzeMHm.
Each tool_use block must have a corresponding tool_result block in the next message.

If we sent (action2, obs2), action2 has a tool result, it's obs2 🤔

It's also curious that it removed the obs, but it had the action... which means the action must have been in keep_first events, those we don't touch afaik. Possible!

💭 Could it be that it inserts the summary between them? This code:

summary_offset=self.keep_first,

@enyst
Copy link
Collaborator

enyst commented Dec 19, 2025

Oh, right, summary interfering between (so we send action2, summary, obs2) might have been exactly what happened in the issue linked to your PR, Calvin

@xingyaoww
Copy link
Collaborator Author

@csmith49 i do like your solution better! sorry for not checking that PR out early! I'll close this one

@xingyaoww xingyaoww closed this Dec 19, 2025
xingyaoww pushed a commit that referenced this pull request Dec 19, 2025
This commit migrates tests from PR #1450 and adds the necessary logic to
enforce batch atomicity when ObservationEvents are forgotten during
condensation.

Changes:
- Add _build_action_batches helper method to extract batch information
- Update _enforce_batch_atomicity to accept warn_on_violation parameter
- Update filter_unmatched_tool_calls to enforce batch atomicity after
  filtering unmatched tool calls
- Add test_view_condensation_batch_atomicity.py with tests for:
  - Batch atomicity when observation is forgotten
  - Batch atomicity when multiple observations are forgotten
  - Different batches remain independent
  - Single action batch observation forgotten
- Update existing tests to include llm_response_id and id attributes
  on mock ActionEvents

Co-authored-by: openhands <[email protected]>
@xingyaoww xingyaoww reopened this Dec 19, 2025
@xingyaoww
Copy link
Collaborator Author

ok after some discussion, @csmith49 think maybe we can merge this first & do a patch release so we can fix the CLI first before we merge his PR for a more fundamental fix

@neubig neubig removed their request for review December 19, 2025 16:19

This prevents partial batches from being sent to the LLM, which can cause
API errors when thinking blocks are separated from their tool calls.
) -> tuple[dict[str, list[str]], dict[str, str], dict[str, ToolCallID]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return type is pretty scary. Not necessary for this quick fix PR (just making a note for later), but we should either split this into three separate functions or build a data class that has each element of the tuple as a field and return that so callers can access the mappings by name.

# Second pass: enforce batch atomicity for ActionEvents
# If any ActionEvent in a batch is removed, all ActionEvents in that
# batch should also be removed
removed_event_ids = View._enforce_batch_atomicity(events, removed_event_ids)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call means the normal View.from_events function will call _enforce_batch_atomicity twice -- that's probably fine? Or maybe an indication that these functions need to be called after any change to the events.

Not a problem now, just leaving a note for the future integration work.

Copy link
Collaborator

@csmith49 csmith49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo the typing downgrade from EventID -> str

@xingyaoww
Copy link
Collaborator Author

@OpenHands please address all the code review comments from @csmith49

@openhands-ai
Copy link

openhands-ai bot commented Dec 19, 2025

I'm on it! xingyaoww can track my progress at all-hands.dev

Address code review feedback from @csmith49 to use the proper EventID
type alias instead of raw str for event IDs and llm_response_id in the
_build_action_batches method.

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Dec 19, 2025

I've addressed the code review comment from @csmith49 and pushed the changes to the PR.

Summary of Changes

Code Review Comment Addressed:

  • @csmith49 commented: "Think these str types should be EventID" on the _build_action_batches method

Changes Made:

  • Updated the _build_action_batches method in openhands-sdk/openhands/sdk/context/view.py to use the proper EventID type alias instead of raw str for:
    • Return type annotation: tuple[dict[EventID, list[EventID]], dict[EventID, EventID], dict[EventID, ToolCallID]]
    • Local variable type annotations: batches, action_id_to_response_id, and action_id_to_tool_call_id

Verification:

  • ✅ Pre-commit hooks pass (ruff format, ruff lint, pycodestyle, pyright, import dependency rules)
  • ✅ All 10 related batch atomicity tests pass

Note: The other two comments from @csmith49 were explicitly noted as "not necessary for this quick fix PR" and "not a problem now, just leaving a note for the future integration work" - so no action was required for those.

The changes have been pushed to the fix/condensation-batch-atomicity branch.

View full conversation

@xingyaoww xingyaoww enabled auto-merge (squash) December 19, 2025 17:38
@xingyaoww xingyaoww merged commit eb06d52 into main Dec 19, 2025
16 checks passed
@xingyaoww xingyaoww deleted the fix/condensation-batch-atomicity branch December 19, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Condensation breaks action-observation pairs when ObservationEvent is forgotten

5 participants