-
Notifications
You must be signed in to change notification settings - Fork 130
SDK: bound latest-user-message scan in Agent.step #1844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: openhands <[email protected]>
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||
|
@OpenHands This PR's diff doesn't make sense to me. Why look up a single user message, when we could maybe save it somewhere or perhaps only its ID? Think about it. Install yourself Propose a fix and commit it here. Use remote gh. Then post a comment with what you did (don't tell me, post on this PR). Note that it will be rendered as markdown. |
|
I'm on it! enyst can track my progress at all-hands.dev |
UpdateI replaced the bounded event scan with a state-level tracker so we no longer touch at all for the hook check. What changed
Tests
tests/test_agent_step_bounded_scan.py .. [100%] =============================== warnings summary =============================== -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html |
Update (correction)I previously posted a comment that lost backticked identifiers. Reposting with proper markdown. I replaced the bounded event scan with a state-level What changed
Tests
|
|
Final summary (new changes since the prior update):
Checklist:
Notes:
|
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
Good optimization to avoid scanning the entire event log! The core approach is sound and well-tested, but I found one critical issue that needs to be addressed.
🔴 Critical Issue: Inconsistent last_user_message_id Updates
Problem
The PR updates last_user_message_id in send_message() (line 477), but there's another location where user MessageEvent objects are created that was not updated:
Location: local_conversation.py lines 531-538 (not shown in this diff)
feedback_msg = MessageEvent(
source="user",
llm_message=Message(
role="user",
content=[TextContent(text=prefixed)],
),
)
self._on_event(feedback_msg) # ❌ last_user_message_id not updated!This occurs when a stop hook denies the agent from finishing and provides feedback.
Why This Matters
- The feedback message is a user message that goes through
_on_event() _on_event()can triggerUserPromptSubmithooks that might block the message- Without updating
last_user_message_id,Agent.step()will NOT detect if this feedback message was blocked - This creates inconsistent behavior: regular user messages are checked, but feedback messages are not
Required Fix
Add the same update pattern used in send_message():
feedback_msg = MessageEvent(
source="user",
llm_message=Message(
role="user",
content=[TextContent(text=prefixed)],
),
)
self._state.last_user_message_id = feedback_msg.id # ✅ Add this line
self._on_event(feedback_msg)🟠 Important: Backward Compatibility
Observation
In agent.py line 270, when last_user_message_id is None, the hook-blocked check is completely skipped:
if state.last_user_message_id is not None:
reason = state.pop_blocked_message(state.last_user_message_id)
# ...
# If None, no check happens at allThis can occur for:
- Old conversations loaded from disk (created before this change)
- Edge cases where user messages are created without updating the field
Impact
The original code would scan events even without last_user_message_id, potentially catching blocked messages. The new code skips the check entirely.
Recommendation
Add defensive logging to catch unexpected cases:
if state.last_user_message_id is not None:
reason = state.pop_blocked_message(state.last_user_message_id)
if reason is not None:
logger.info(f"User message blocked by hook: {reason}")
state.execution_status = ConversationExecutionStatus.FINISHED
return
else:
# Defensive check: warn if blocked messages exist but ID is missing
if state.blocked_messages:
logger.warning(
"Blocked messages exist but last_user_message_id is None. "
"This may indicate a bug or old conversation state."
)This helps identify bugs during the transition period.
🟡 Suggestions for Improvement
1. Test Coverage
Current tests manually set last_user_message_id but don't verify behavior when it's None. Consider adding:
def test_agent_step_handles_none_last_user_message_id(tmp_path):
"""Verify graceful handling when last_user_message_id is not set."""
agent = Agent(llm=LLM(model="gpt-4o-mini", api_key="x"), tools=[])
workspace = LocalWorkspace(working_dir=tmp_path)
conv = LocalConversation(agent=agent, workspace=workspace)
# Simulate old conversation without last_user_message_id
user_msg = MessageEvent(
source="user",
llm_message=Message(role="user", content=[TextContent(text="hi")]),
)
conv._on_event(user_msg)
# Intentionally don't set last_user_message_id
# Should not crash when None
agent.step(conv, on_event=conv._on_event)
# Verify expected behavior is documented2. Documentation Enhancement
The new field in state.py is well-documented, but could clarify its relationship to blocked_messages:
last_user_message_id: EventID | None = Field(
default=None,
description=(
"Most recent user MessageEvent ID for hook block checks. "
"Updated automatically when user messages are dispatched. "
"Used by Agent.step to avoid scanning the full event log. "
"If None, hook-blocked check is skipped (backward compatibility)."
),
)✅ What's Good
- Strong optimization: Avoids O(n) scan of event log for every step
- Comprehensive tests: Creative use of custom EventLog classes to verify non-iteration
- Clear motivation: PR description explains the performance issue well
- Server compatibility: RemoteConversation uses LocalConversation internally, so it inherits the fix automatically
Summary
| Priority | Issue | Status |
|---|---|---|
| 🔴 Critical | Update last_user_message_id for feedback messages (line 531-538) |
Must fix |
| 🟠 Important | Add defensive logging for None case | Recommended |
| 🟡 Suggestion | Add test for None scenario | Nice to have |
| 🟡 Suggestion | Enhance field documentation | Nice to have |
Action required: Fix the feedback message case to ensure consistent behavior across all user message creation paths.
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
| extended_content=extended_content, | ||
| sender=sender, | ||
| ) | ||
| self._state.last_user_message_id = user_msg_event.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we modify _state, we should hold the lock for .state
I wonder if we can just modify the default on_event callback, so we will update it when we've dectected message event from user
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comment - Once address this lgtm
Fixes #1839.
Summary
MessageEventID on conversation state to avoid scanningstate.eventswhen checking for hook-blocked prompts.Agent.stepto decide if aUserPromptSubmithook blocked the most recent user message.Agent.stepdoes not iterate the event log when the stored ID is present.Rationale
state.eventscan be file-backed/lazily-loaded. Materializing or iterating the full history to find the most recent user message forces deserialization of the entire log even though we only need a single message ID. Storing the latest user message ID in the state preserves the hook behavior without forcing a scan.Testing
python -m pytest -q tests/test_agent_step_bounded_scan.pypre-commit run -a@enyst can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:a56defa-pythonRun
All tags pushed for this build
About Multi-Architecture Support
a56defa-python) is a multi-arch manifest supporting both amd64 and arm64a56defa-python-amd64) are also available if needed