Add conversation.execute_tool() method for pre-run tool execution by xingyaoww · Pull Request #1833 · OpenHands/software-agent-sdk

xingyaoww · 2026-01-26T17:07:28Z

Summary

This PR adds a new execute_tool(tool_name, action) method to the Conversation class that allows executing tools before conversation.run() is called. This addresses issue #1831 where the 30_tom_agent.py example was failing because it tried to access tools_map before the agent was initialized.

The new method:

Automatically handles agent initialization (loads plugins and initializes tools)
Allows executing tools before the first run() call
Is useful for pre-run setup operations (e.g., indexing repositories)
Enables manual tool execution for environment setup
Supports testing tool behavior outside the agent loop

Changes

Add abstract execute_tool method to BaseConversation
Implement execute_tool in LocalConversation with automatic agent initialization
Add NotImplementedError stub in RemoteConversation (not yet supported for remote conversations)
Update 30_tom_agent.py example to use the new execute_tool method
Add comprehensive tests for execute_tool functionality

Example Usage

from openhands.sdk import LLM, Agent, Conversation
from openhands.tools.tom_consult import SleeptimeComputeAction

# Create conversation
conversation = Conversation(agent=agent, workspace=cwd)

# Execute tool before run() - agent is automatically initialized
result = conversation.execute_tool("sleeptime_compute", SleeptimeComputeAction())
print(f"Sessions processed: {result.sessions_processed}")

# Now run the conversation normally
conversation.send_message("Hello!")
conversation.run()

Fixes #1831

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

@xingyaoww can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:e70174f-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-e70174f-python \
  ghcr.io/openhands/agent-server:e70174f-python

All tags pushed for this build

ghcr.io/openhands/agent-server:e70174f-golang-amd64
ghcr.io/openhands/agent-server:e70174f-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:e70174f-golang-arm64
ghcr.io/openhands/agent-server:e70174f-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:e70174f-java-amd64
ghcr.io/openhands/agent-server:e70174f-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:e70174f-java-arm64
ghcr.io/openhands/agent-server:e70174f-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:e70174f-python-amd64
ghcr.io/openhands/agent-server:e70174f-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:e70174f-python-arm64
ghcr.io/openhands/agent-server:e70174f-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:e70174f-golang
ghcr.io/openhands/agent-server:e70174f-java
ghcr.io/openhands/agent-server:e70174f-python

About Multi-Architecture Support

Each variant tag (e.g., e70174f-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., e70174f-python-amd64) are also available if needed

This commit adds a new execute_tool(tool_name, action) method to the Conversation class that allows executing tools before conversation.run() is called. This is useful for: - Pre-run setup operations (e.g., indexing repositories) - Manual tool execution for environment setup - Testing tool behavior outside the agent loop Changes: - Add abstract execute_tool method to BaseConversation - Implement execute_tool in LocalConversation with automatic agent initialization - Add NotImplementedError stub in RemoteConversation (not yet supported) - Update 30_tom_agent.py example to use the new execute_tool method - Add comprehensive tests for execute_tool functionality Fixes #1831 Co-authored-by: openhands <[email protected]>

xingyaoww · 2026-02-03T16:56:01Z

Note: The 30_tom_agent.py example in the release PR #1884 is failing because it tries to access conversation.agent.tools_map before the agent is initialized. This PR should fix that issue once merged.

xingyaoww · 2026-02-03T16:59:52Z

@OpenHands please fix the failing actions on PR #1833 at branch openhands/add-execute-tool-method

openhands-ai · 2026-02-03T17:00:02Z

I'm on it! xingyaoww can track my progress at all-hands.dev

xingyaoww · 2026-02-03T17:02:20Z

@OpenHands /codereview-roasted

openhands-ai · 2026-02-03T17:02:30Z

I'm on it! xingyaoww can track my progress at all-hands.dev

Add the execute_tool method to MockConversation in test_base_span_management.py to fix pyright type checking errors. The abstract method was added to BaseConversation but the test mock class was missing the implementation. Co-authored-by: openhands <[email protected]>

openhands-ai · 2026-02-03T17:04:25Z

Code Review Summary for PR #1833

I performed a Linus Torvalds-style "roasted" code review of the execute_tool() method PR. Here's the verdict:

🔴 Taste Rating: Needs Improvement (but worth merging after fixes)

Critical Issue Found:

[remote_conversation.py, Lines 17-20] - The TYPE_CHECKING import block is placed awkwardly between regular imports instead of at the top with other imports. This is a basic Python hygiene issue that should be fixed.

Improvement Opportunities:

[30_tom_agent.py] - Error handling catches KeyError but ignores NotImplementedError which the docstring says can be raised
[local_conversation.py] - Tool executions via execute_tool don't appear in conversation history (may be intentional but should be documented)
[test_execute_tool.py] - 253 lines of test code for a 15-line method - significant boilerplate

What's Good:

Core design is correct: lazy agent initialization when execute_tool is called before run()
Solves a real problem pragmatically
Error messages include available tools (helpful for debugging)
NotImplementedError for RemoteConversation is appropriate

Verdict: ✅ Worth merging after addressing the import organization issue in remote_conversation.py.

Key Insight: The fundamental design is sound - lazy initialization solves the actual user problem without over-engineering. Just needs cleanup of rough edges.

No code changes were made - this was a review-only request.

View full conversation

github-actions · 2026-02-03T17:09:01Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/conversation/impl
local_conversation.py	325	17	94%	268, 273, 301, 366, 408, 554–555, 558, 704, 712, 714, 725, 727–729, 912–913
remote_conversation.py	558	107	80%	131, 158, 171, 173–176, 186, 208–209, 214–217, 293, 303–305, 311, 352, 484–487, 489, 509–513, 518–521, 524, 657–658, 662–663, 674, 693–694, 713, 732–735, 737–738, 756, 762–763, 767, 772–773, 779–781, 784–788, 790–791, 795, 797–805, 807, 844, 976–977, 981, 986–990, 996–1002, 1015–1016, 1092, 1099, 1105–1106, 1156, 1162–1163, 1167, 1170–1172, 1177–1178
TOTAL	16928	4965	70%

openhands-ai · 2026-02-03T17:09:45Z

Summary

I fixed the failing GitHub Actions on PR #1833 by addressing the pre-commit check failure.

Issue

The pyright type checker was failing because the MockConversation class in tests/sdk/conversation/test_base_span_management.py didn't implement the execute_tool abstract method that was added to BaseConversation as part of this PR.

Fix

Added the missing execute_tool method implementation to the MockConversation test class:

def execute_tool(self, tool_name: str, action: Action) -> Observation:
    """Mock implementation of execute_tool method."""
    raise NotImplementedError("Mock execute_tool not implemented")

Also added the necessary import: from openhands.sdk.tool.schema import Action, Observation

Changes

File modified: tests/sdk/conversation/test_base_span_management.py
Lines added: 5 (1 import line + 4 method lines)

CI Status

All 24 checks are now passing (1 skipped):

✅ pre-commit (was failing, now fixed)
✅ sdk-tests
✅ tools-tests
✅ agent-server-tests
✅ cross-tests
✅ check-examples
✅ All Build & Push jobs
✅ All other checks

The commit has been pushed to the openhands/add-execute-tool-method branch: #1833

View full conversation

xingyaoww · 2026-02-04T14:39:41Z

@OpenHands please do /codereview-roasted and uv run examples/01_standalone_sdk/30_tom_agent.py example using LLM_API_KEY in your env vars. Then tell me, if the changes in this PR gets it to work?

openhands-ai · 2026-02-04T14:39:50Z

I'm on it! xingyaoww can track my progress at all-hands.dev

openhands-ai · 2026-02-04T14:43:05Z

Summary

I ran the 30_tom_agent.py example and performed a Linus Torvalds-style code review of PR #1833.

✅ Does the PR Fix the Issue?

Yes, the PR successfully fixes the issue! The example ran to completion:

execute_tool worked correctly - It initialized the agent and executed sleeptime_compute before conversation.run() was called
The conversation ran successfully - The Tom agent provided guidance and the example completed with EXAMPLE_COST: 0.0395

Note: The example requires LLM_BASE_URL to be set (without trailing slash) when using the LiteLLM proxy:

LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" uv run examples/01_standalone_sdk/30_tom_agent.py

🔥 Linus Torvalds-Style Code Review

Taste Rating: 🟢 Good Taste - Elegant, simple solution

This PR demonstrates good engineering judgment. The problem was clear (accessing tools_map before agent initialization), and the solution is appropriately minimal.

[CRITICAL ISSUES] - None found

The core design is sound. The execute_tool method:

Properly encapsulates agent initialization via _ensure_agent_ready()
Provides a clean API that hides complexity from users
Follows the existing patterns in the codebase

[IMPROVEMENT OPPORTUNITIES]

[remote_conversation.py, Lines 17-20] 🔧 Import Organization
The TYPE_CHECKING import block is placed awkwardly between regular imports. Should be grouped with other typing imports at the top:

from typing import TYPE_CHECKING, SupportsIndex, overload
# ... other imports ...
if TYPE_CHECKING:
    from openhands.sdk.tool.schema import Action, Observation

[local_conversation.py, Line 897-906] 💡 Error Message Enhancement
The KeyError message is good but could include a hint about using execute_tool before run():

raise KeyError(
    f"Tool '{tool_name}' not found. Available tools: {available_tools}. "
    f"Note: execute_tool() can be called before run() to execute tools early."
)

[STYLE NOTES]

[test_execute_tool.py] ✅ Good Test Coverage
The tests are comprehensive and cover:

Basic functionality
Agent initialization
Before/after send_message
Tool not found error
Multiple calls
Conversation context passing

VERDICT: ✅ Worth Merging

Core logic is sound, the solution is elegant, and it solves a real problem. The minor import organization issue doesn't block merging.

KEY INSIGHT:

The PR correctly identifies that the real problem was premature access to internal state (tools_map), and solves it by providing a proper public API (execute_tool) that handles initialization automatically. This is good API design - hiding complexity while providing a clean interface.

View full conversation

all-hands-bot

Overall solid implementation that solves the issue of executing tools before run(). The code is well-tested and the API design is clean. Found a few areas for improvement around error handling and documentation consistency.

all-hands-bot · 2026-02-04T14:43:10Z

openhands-sdk/openhands/sdk/conversation/base.py

+        Returns:
+            The observation returned by the tool execution
+
+        Raises:


🟡 Suggestion: The docstring mentions "Raises: NotImplementedError: If the tool has no executor", but this exception is never raised in the LocalConversation implementation. Either add the check in the implementation or remove this from the docstring for accuracy.

all-hands-bot · 2026-02-04T14:43:11Z

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py

+            )
+
+        # Execute the tool
+        return tool(action, self)


🟡 Suggestion: Consider adding a check to verify the tool has an executor before calling it, or clarify in the docstring what happens when executor is None. This would make the error behavior more explicit:

Suggested change

return tool(action, self)

# Execute the tool

if not tool.executor:

raise NotImplementedError(f"Tool '{tool_name}' has no executor")

return tool(action, self)

all-hands-bot · 2026-02-04T14:43:11Z

examples/01_standalone_sdk/30_tom_agent.py

+# Using execute_tool allows running tools before conversation.run()
+print("\nRunning sleeptime compute to index conversations...")
+try:
+    sleeptime_result = conversation.execute_tool(
+        "sleeptime_compute", SleeptimeComputeAction()
    )
-    print(f"Result: {sleeptime_result.message}")
-    print(f"Sessions processed: {sleeptime_result.sessions_processed}")
+    # Cast to the expected observation type for type-safe access
+    if isinstance(sleeptime_result, SleeptimeComputeObservation):
+        print(f"Result: {sleeptime_result.message}")
+        print(f"Sessions processed: {sleeptime_result.sessions_processed}")
+    else:
+        print(f"Result: {sleeptime_result.text}")
+except KeyError as e:
+    print(f"Tool not available: {e}")


🟠 Important: The error handling only catches KeyError, but tool execution itself could fail with other exceptions (e.g., if the tool logic raises an error). Consider catching a broader exception or documenting that tool execution errors will propagate:

Suggested change

# Using execute_tool allows running tools before conversation.run()

print("\nRunning sleeptime compute to index conversations...")

try:

sleeptime_result = conversation.execute_tool(

"sleeptime_compute", SleeptimeComputeAction()

)

print(f"Result: {sleeptime_result.message}")

print(f"Sessions processed: {sleeptime_result.sessions_processed}")

# Cast to the expected observation type for type-safe access

if isinstance(sleeptime_result, SleeptimeComputeObservation):

print(f"Result: {sleeptime_result.message}")

print(f"Sessions processed: {sleeptime_result.sessions_processed}")

else:

print(f"Result: {sleeptime_result.text}")

except KeyError as e:

print(f"Tool not available: {e}")

print("\nRunning sleeptime compute to index conversations...")

try:

sleeptime_result = conversation.execute_tool(

"sleeptime_compute", SleeptimeComputeAction()

)

# Cast to the expected observation type for type-safe access

if isinstance(sleeptime_result, SleeptimeComputeObservation):

print(f"Result: {sleeptime_result.message}")

print(f"Sessions processed: {sleeptime_result.sessions_processed}")

else:

print(f"Result: {sleeptime_result.text}")

except KeyError as e:

print(f"Tool not available: {e}")

except Exception as e:

print(f"Tool execution failed: {e}")

all-hands-bot · 2026-02-04T14:43:11Z

examples/01_standalone_sdk/30_tom_agent.py

+    # Cast to the expected observation type for type-safe access
+    if isinstance(sleeptime_result, SleeptimeComputeObservation):
+        print(f"Result: {sleeptime_result.message}")
+        print(f"Sessions processed: {sleeptime_result.sessions_processed}")
+    else:
+        print(f"Result: {sleeptime_result.text}")


🟢 Nit: The isinstance check with fallback to .text suggests uncertainty about the return type. Since execute_tool() returns Observation, all observations should have a .text attribute. Consider simplifying this unless there's a specific reason for the type checking:

Suggested change

# Cast to the expected observation type for type-safe access

if isinstance(sleeptime_result, SleeptimeComputeObservation):

print(f"Result: {sleeptime_result.message}")

print(f"Sessions processed: {sleeptime_result.sessions_processed}")

else:

print(f"Result: {sleeptime_result.text}")

if isinstance(sleeptime_result, SleeptimeComputeObservation):

print(f"Result: {sleeptime_result.message}")

print(f"Sessions processed: {sleeptime_result.sessions_processed}")

Or if you want to always show the text as fallback:

Suggested change

# Cast to the expected observation type for type-safe access

if isinstance(sleeptime_result, SleeptimeComputeObservation):

print(f"Result: {sleeptime_result.message}")

print(f"Sessions processed: {sleeptime_result.sessions_processed}")

else:

print(f"Result: {sleeptime_result.text}")

print(f"Result: {sleeptime_result.text}")

if isinstance(sleeptime_result, SleeptimeComputeObservation):

print(f"Sessions processed: {sleeptime_result.sessions_processed}")

all-hands-bot · 2026-02-04T14:43:11Z

tests/sdk/conversation/local/test_execute_tool.py

+    # Verify conversation was passed (result should contain conversation ID)
+    assert "conv_id:" in result.text
+    assert isinstance(result, ExecuteToolTestObservation)
+    assert result.result == "context_test"


🟢 Nit: Consider adding a test case for when a tool exists but has no executor (if that's a valid scenario). This would help verify the error handling is correct for that edge case.

enyst · 2026-02-04T18:04:49Z

@OpenHands Do a /codereview-roasted on this PR diff.

openhands-ai · 2026-02-04T18:05:00Z

I'm on it! enyst can track my progress at all-hands.dev

enyst · 2026-02-04T18:09:56Z

🟡 Taste Rating: Acceptable (but with some sharp edges)

You fixed the real problem (tools_map before init) without overengineering, so kudos. But the new API has a couple of “you’re gonna regret this later” quirks. Here’s the blunt review.

[CRITICAL ISSUES]

None. This won’t blow up production on merge.

[IMPROVEMENT OPPORTUNITIES]

[openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py, Lines 894–906] Bypasses confirmation/security flow: execute_tool() runs the tool directly, skipping the confirmation policy + security analyzer path the agent loop uses. That’s fine for power users, but it’s a silent behavioral change: destructive tools can now run with zero confirmation. At minimum, document that this method bypasses confirmation. Better: add an optional confirm: bool or pipe through the same confirmation policy used in Agent.step().
[tests/sdk/conversation/local/test_execute_tool.py, Lines 78–80 & 242–243] Global registry pollution: register_tool() is called at import time and never cleaned up. That means this test module permanently mutates the global tool registry, which can bleed into other tests and make order-dependence bugs. Use a fixture that snapshots/restores the registry, or register under a randomized name to avoid collisions.
[openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py, Lines 6–18] Import hygiene: The TYPE_CHECKING block is dropped mid‑import with extra blank lines. It’s minor, but it’s sloppy. Keep standard imports, then typing helpers, then local imports. You’re not in a Jupyter notebook—follow the style guide.

[STYLE NOTES]

[examples/01_standalone_sdk/30_tom_agent.py, Lines 71–86] The example now prints a generic “Result: …” for non‑SleeptimeComputeObservation results. That’s fine, but for a tutorial, you want deterministic output. Consider explicitly handling the Observation.text case with a clear label (e.g., Result text:) so readers don’t mistake it for the structured result.

VERDICT:

✅ Worth merging once you acknowledge (or document) the confirmation bypass and stop leaking tools into the global registry from tests.

Key insight: You fixed the initialization bug, but the new API is a “sharp knife.” Make sure users understand it cuts straight through your safety rails.

Ensure test registry registrations are reset per test to avoid cross-test pollution.

openhands-ai bot mentioned this pull request Jan 26, 2026

30_tom_agent.py example fails: tools_map accessed before agent initialization #1831

Open

xingyaoww mentioned this pull request Feb 3, 2026

Release v1.11.0 #1884

Merged

11 tasks

Merge branch 'main' into openhands/add-execute-tool-method

675de81

xingyaoww marked this pull request as ready for review February 4, 2026 14:38

Merge branch 'main' into openhands/add-execute-tool-method

01e867d

all-hands-bot reviewed Feb 4, 2026

View reviewed changes

OpenHands deleted a comment from openhands-ai bot Feb 4, 2026

This comment was marked as duplicate.

Sign in to view

enyst added 2 commits February 4, 2026 19:25

Fix execute_tool test registry isolation

90459ba

Ensure test registry registrations are reset per test to avoid cross-test pollution.

Merge branch 'main' into openhands/add-execute-tool-method

98ec050

Conversation

xingyaoww commented Jan 26, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Example Usage

Checklist

Uh oh!

xingyaoww commented Feb 3, 2026

Uh oh!

xingyaoww commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

xingyaoww commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Code Review Summary for PR #1833

🔴 Taste Rating: Needs Improvement (but worth merging after fixes)

Critical Issue Found:

Improvement Opportunities:

What's Good:

Uh oh!

github-actions bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openhands-ai bot commented Feb 3, 2026

Summary

Issue

Fix

Changes

CI Status

Uh oh!

xingyaoww commented Feb 4, 2026

Uh oh!

openhands-ai bot commented Feb 4, 2026

Uh oh!

openhands-ai bot commented Feb 4, 2026

Summary

✅ Does the PR Fix the Issue?

🔥 Linus Torvalds-Style Code Review

Taste Rating: 🟢 Good Taste - Elegant, simple solution

[CRITICAL ISSUES] - None found

[IMPROVEMENT OPPORTUNITIES]

[STYLE NOTES]

VERDICT: ✅ Worth Merging

KEY INSIGHT:

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

enyst commented Feb 4, 2026

Uh oh!

openhands-ai bot commented Feb 4, 2026

Uh oh!

enyst commented Feb 4, 2026

[CRITICAL ISSUES]

[IMPROVEMENT OPPORTUNITIES]

[STYLE NOTES]

VERDICT:

Uh oh!

This comment was marked as duplicate.

Reviewers

xingyaoww commented Jan 26, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Feb 3, 2026 •

edited

Loading