Skip to content

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Jan 9, 2026

Problem

Non-JSON formatted logs were appearing in Datadog for service:agent-server. Investigation revealed that exceptions from conversation.run() were not being caught, causing Python's default exception handler to print tracebacks directly to stderr as plain text, bypassing the JSON logger.

Example of the problematic logs (each line appearing as a separate log entry with Logger: unknown):

Traceback (most recent call last):
File "openhands/sdk/conversation/impl/local_conversation.py", line 411, in run
...
openhands.sdk.conversation.exceptions.ConversationRunError: Conversation run failed for id=...

Root Cause

Two places in event_service.py were calling conversation.run() without proper exception handling:

  1. Line 314 (send_message): loop.run_in_executor(None, self._conversation.run) - fire-and-forget call with NO exception handling
  2. Lines 499-500 (run method): Had a try/except but used logger.error(f"...") which doesn't include the traceback

Changes

  1. event_service.py send_message(): Wrapped the fire-and-forget run_in_executor call in an async task with exception handling. This preserves the original fire-and-forget behavior while adding proper error logging.

  2. event_service.py run(): Changed logger.error(f"Error during conversation run: {e}") to logger.exception("Error during conversation run") to include the full traceback in structured JSON logs.

  3. Updated test: Modified test_send_message_with_run_true_agent_idle to work with the new async wrapper pattern.

Result

After this fix:

  • All exceptions from conversation.run() will be caught and logged through the JSON logger
  • Full tracebacks will be included in the structured log output
  • The original fire-and-forget behavior is preserved (no behavioral changes)

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:aee005c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-aee005c-python \
  ghcr.io/openhands/agent-server:aee005c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:aee005c-golang-amd64
ghcr.io/openhands/agent-server:aee005c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:aee005c-golang-arm64
ghcr.io/openhands/agent-server:aee005c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:aee005c-java-amd64
ghcr.io/openhands/agent-server:aee005c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:aee005c-java-arm64
ghcr.io/openhands/agent-server:aee005c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:aee005c-python-amd64
ghcr.io/openhands/agent-server:aee005c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:aee005c-python-arm64
ghcr.io/openhands/agent-server:aee005c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:aee005c-golang
ghcr.io/openhands/agent-server:aee005c-java
ghcr.io/openhands/agent-server:aee005c-python

About Multi-Architecture Support

  • Each variant tag (e.g., aee005c-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., aee005c-python-amd64) are also available if needed

The core issue was that ConversationRunError exceptions were not being
caught, causing Python's default exception handler to print tracebacks
directly to stderr as plain text, bypassing the JSON logger.

Changes:
1. event_service.py line 314: Changed from fire-and-forget
   loop.run_in_executor() to await self.run() which has proper
   exception handling

2. event_service.py line 499-500: Changed logger.error() to
   logger.exception() to include full traceback in structured logs

3. Updated test to match new send_message behavior

Co-authored-by: openhands <[email protected]>
@neubig neubig force-pushed the fix-conversation-run-exception-handling branch from f516fe1 to 79c564b Compare January 9, 2026 21:39
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   event_service.py32016448%55–56, 75–77, 81–86, 89–92, 107, 123, 127, 131–132, 139, 141, 148–149, 157–160, 167–169, 186, 210–211, 214–215, 217–219, 221, 226, 229–230, 233–235, 238, 242–244, 246, 248, 259–262, 275–276, 279–280, 283, 286–288, 291–292, 295–296, 300, 303, 307, 311–312, 314, 316–320, 328, 345–346, 363, 365, 369–371, 375, 384–385, 387, 391, 397, 399, 407–412, 461, 463–466, 475, 491, 498, 502, 513–514, 524–527, 529–530, 534, 536, 540–543, 548–550, 552, 556–559, 563–566, 574–577, 596–597, 599–606, 608–609, 618–619, 621–622, 629–630, 632–633, 637, 643, 653–654, 661
TOTAL14843711652% 

The core issue was that exceptions from conversation.run() were not being
caught, causing Python's default exception handler to print tracebacks
directly to stderr as plain text, bypassing the JSON logger.

Changes:
1. event_service.py send_message(): Wrapped the fire-and-forget
   run_in_executor call in an async task with exception handling.
   Uses loop.create_task() to ensure the task runs on the same event
   loop. This preserves the original fire-and-forget behavior while
   adding proper error logging.

2. event_service.py run(): Changed logger.error() to logger.exception()
   to include full traceback in structured logs.

3. Updated test to work with the new async wrapper pattern.

Co-authored-by: openhands <[email protected]>
@neubig neubig force-pushed the fix-conversation-run-exception-handling branch from 3ad7729 to dea903e Compare January 9, 2026 21:48
@neubig neubig marked this pull request as ready for review January 9, 2026 21:52
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling improvements look good overall, but there are a few concerns around task tracking, test reliability, and test coverage. See inline comments for details.

except Exception:
logger.exception("Error during conversation run from send_message")

loop.create_task(_run_with_error_handling())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Untracked fire-and-forget task

The task created here is not stored anywhere, unlike the similar pattern in the run() method (line 515) which stores it in self._run_task. This creates several issues:

  • Cannot wait for this task during shutdown (see close() method at line 575)
  • Task could be cancelled during service teardown without proper cleanup
  • No visibility into whether the task completed or failed
  • No way to prevent multiple concurrent runs from send_message() (vs. the check at line 495 for run())

Suggestion: Consider either:

  1. Store these tasks in a set/list (e.g., self._background_tasks) and clean them up in close()
  2. At minimum, add a comment explaining why this is intentionally fire-and-forget without tracking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with 2.

# Verify run was called via executor since agent is idle
mock_loop.run_in_executor.assert_any_call(None, conversation.run)
# Give the background task a chance to run
await asyncio.sleep(0.01)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Flaky timing-based test

Using asyncio.sleep(0.01) creates a timing-dependent test that could be flaky on slow CI systems or under heavy load. 10ms may not be enough time for the executor to run the task on an overloaded system.

Suggestion: Consider a more deterministic approach, such as mocking loop.create_task to capture and directly await the created task, or using a longer timeout with asyncio.wait_for().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mocking, but if you think that wait_for is a better option go for it.

assert mock_loop.run_in_executor.call_count == 1 # Only send_message call

@pytest.mark.asyncio
async def test_send_message_with_run_true_agent_idle(self, event_service):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Missing exception handling test

This test does not verify that exceptions from conversation.run() are properly caught and logged. There's a similar test at lines 1105-1125 that tests the run() method's exception handling, but no equivalent for send_message().

Suggestion: Add a test case where conversation.run.side_effect = RuntimeError("Test error") to verify that:

  1. The exception is caught
  2. The exception is logged via logger.exception()
  3. The service continues functioning after the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, check that please.

@neubig
Copy link
Contributor Author

neubig commented Jan 10, 2026

@OpenHands see the review comments and my follow ups, revise, update the PR, and make sure that CI passes.

@openhands-ai
Copy link

openhands-ai bot commented Jan 10, 2026

I'm on it! neubig can track my progress at all-hands.dev

- Add comment explaining fire-and-forget pattern for send_message() task
- Replace asyncio.sleep with asyncio.wait_for for more deterministic testing
- Add test for exception handling in send_message() with run=True

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Jan 10, 2026

I've addressed all the PR review comments and pushed the changes to the branch. Here's a summary of what was done:

Changes Made (PR #1680)

1. ✅ Added comment explaining fire-and-forget pattern

File: openhands-agent-server/openhands/agent_server/event_service.py

Added a comment above loop.create_task(_run_with_error_handling()) explaining why the task is intentionally not tracked (per neubig's choice of option 2):

# Fire-and-forget: This task is intentionally not tracked because
# send_message() is designed to return immediately after queuing the
# message. The conversation run happens in the background and any
# errors are logged. Unlike the run() method which is explicitly
# awaited, this pattern allows clients to send messages without
# blocking on the full conversation execution.

2. ✅ Improved test timing with asyncio.wait_for

File: tests/agent_server/test_event_service.py

Replaced the flaky asyncio.sleep(0.01) with a more deterministic polling approach using asyncio.wait_for():

async def wait_for_run_called():
    while not conversation.run.called:
        await asyncio.sleep(0.001)

await asyncio.wait_for(wait_for_run_called(), timeout=1.0)

3. ✅ Added exception handling test for send_message()

File: tests/agent_server/test_event_service.py

Added new test test_send_message_with_run_true_logs_exception that verifies:

  • Exceptions from conversation.run() are caught
  • Exceptions are logged via logger.exception()
  • The service continues functioning after the error

Verification

  • All 57 tests in test_event_service.py pass
  • Pre-commit hooks (ruff format, ruff lint, pycodestyle, pyright) all pass
  • Changes have been pushed to origin/fix-conversation-run-exception-handling

The PR is now updated and ready for CI to run.

View full conversation

@neubig neubig requested a review from tofarr January 14, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants