Skip to content

Add conversation.execute_tool() method for pre-run tool execution#1833

Open
xingyaoww wants to merge 6 commits intomainfrom
openhands/add-execute-tool-method
Open

Add conversation.execute_tool() method for pre-run tool execution#1833
xingyaoww wants to merge 6 commits intomainfrom
openhands/add-execute-tool-method

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 26, 2026

Summary

This PR adds a new execute_tool(tool_name, action) method to the Conversation class that allows executing tools before conversation.run() is called. This addresses issue #1831 where the 30_tom_agent.py example was failing because it tried to access tools_map before the agent was initialized.

The new method:

  • Automatically handles agent initialization (loads plugins and initializes tools)
  • Allows executing tools before the first run() call
  • Is useful for pre-run setup operations (e.g., indexing repositories)
  • Enables manual tool execution for environment setup
  • Supports testing tool behavior outside the agent loop

Changes

  • Add abstract execute_tool method to BaseConversation
  • Implement execute_tool in LocalConversation with automatic agent initialization
  • Add NotImplementedError stub in RemoteConversation (not yet supported for remote conversations)
  • Update 30_tom_agent.py example to use the new execute_tool method
  • Add comprehensive tests for execute_tool functionality

Example Usage

from openhands.sdk import LLM, Agent, Conversation
from openhands.tools.tom_consult import SleeptimeComputeAction

# Create conversation
conversation = Conversation(agent=agent, workspace=cwd)

# Execute tool before run() - agent is automatically initialized
result = conversation.execute_tool("sleeptime_compute", SleeptimeComputeAction())
print(f"Sessions processed: {result.sessions_processed}")

# Now run the conversation normally
conversation.send_message("Hello!")
conversation.run()

Fixes #1831

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:e70174f-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-e70174f-python \
  ghcr.io/openhands/agent-server:e70174f-python

All tags pushed for this build

ghcr.io/openhands/agent-server:e70174f-golang-amd64
ghcr.io/openhands/agent-server:e70174f-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:e70174f-golang-arm64
ghcr.io/openhands/agent-server:e70174f-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:e70174f-java-amd64
ghcr.io/openhands/agent-server:e70174f-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:e70174f-java-arm64
ghcr.io/openhands/agent-server:e70174f-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:e70174f-python-amd64
ghcr.io/openhands/agent-server:e70174f-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:e70174f-python-arm64
ghcr.io/openhands/agent-server:e70174f-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:e70174f-golang
ghcr.io/openhands/agent-server:e70174f-java
ghcr.io/openhands/agent-server:e70174f-python

About Multi-Architecture Support

  • Each variant tag (e.g., e70174f-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., e70174f-python-amd64) are also available if needed

This commit adds a new execute_tool(tool_name, action) method to the
Conversation class that allows executing tools before conversation.run()
is called. This is useful for:
- Pre-run setup operations (e.g., indexing repositories)
- Manual tool execution for environment setup
- Testing tool behavior outside the agent loop

Changes:
- Add abstract execute_tool method to BaseConversation
- Implement execute_tool in LocalConversation with automatic agent initialization
- Add NotImplementedError stub in RemoteConversation (not yet supported)
- Update 30_tom_agent.py example to use the new execute_tool method
- Add comprehensive tests for execute_tool functionality

Fixes #1831

Co-authored-by: openhands <[email protected]>
Copy link
Collaborator Author

Note: The 30_tom_agent.py example in the release PR #1884 is failing because it tries to access conversation.agent.tools_map before the agent is initialized. This PR should fix that issue once merged.

@xingyaoww
Copy link
Collaborator Author

@OpenHands please fix the failing actions on PR #1833 at branch openhands/add-execute-tool-method

@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@xingyaoww
Copy link
Collaborator Author

@OpenHands /codereview-roasted

@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

Add the execute_tool method to MockConversation in test_base_span_management.py
to fix pyright type checking errors. The abstract method was added to
BaseConversation but the test mock class was missing the implementation.

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

Code Review Summary for PR #1833

I performed a Linus Torvalds-style "roasted" code review of the execute_tool() method PR. Here's the verdict:

🔴 Taste Rating: Needs Improvement (but worth merging after fixes)


Critical Issue Found:

[remote_conversation.py, Lines 17-20] - The TYPE_CHECKING import block is placed awkwardly between regular imports instead of at the top with other imports. This is a basic Python hygiene issue that should be fixed.

Improvement Opportunities:

  1. [30_tom_agent.py] - Error handling catches KeyError but ignores NotImplementedError which the docstring says can be raised
  2. [local_conversation.py] - Tool executions via execute_tool don't appear in conversation history (may be intentional but should be documented)
  3. [test_execute_tool.py] - 253 lines of test code for a 15-line method - significant boilerplate

What's Good:

  • Core design is correct: lazy agent initialization when execute_tool is called before run()
  • Solves a real problem pragmatically
  • Error messages include available tools (helpful for debugging)
  • NotImplementedError for RemoteConversation is appropriate

Verdict: ✅ Worth merging after addressing the import organization issue in remote_conversation.py.

Key Insight: The fundamental design is sound - lazy initialization solves the actual user problem without over-engineering. Just needs cleanup of rough edges.


No code changes were made - this was a review-only request.

View full conversation

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py3251794%268, 273, 301, 366, 408, 554–555, 558, 704, 712, 714, 725, 727–729, 912–913
   remote_conversation.py55810780%131, 158, 171, 173–176, 186, 208–209, 214–217, 293, 303–305, 311, 352, 484–487, 489, 509–513, 518–521, 524, 657–658, 662–663, 674, 693–694, 713, 732–735, 737–738, 756, 762–763, 767, 772–773, 779–781, 784–788, 790–791, 795, 797–805, 807, 844, 976–977, 981, 986–990, 996–1002, 1015–1016, 1092, 1099, 1105–1106, 1156, 1162–1163, 1167, 1170–1172, 1177–1178
TOTAL16928496570% 

@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

Summary

I fixed the failing GitHub Actions on PR #1833 by addressing the pre-commit check failure.

Issue

The pyright type checker was failing because the MockConversation class in tests/sdk/conversation/test_base_span_management.py didn't implement the execute_tool abstract method that was added to BaseConversation as part of this PR.

Fix

Added the missing execute_tool method implementation to the MockConversation test class:

def execute_tool(self, tool_name: str, action: Action) -> Observation:
    """Mock implementation of execute_tool method."""
    raise NotImplementedError("Mock execute_tool not implemented")

Also added the necessary import: from openhands.sdk.tool.schema import Action, Observation

Changes

  • File modified: tests/sdk/conversation/test_base_span_management.py
  • Lines added: 5 (1 import line + 4 method lines)

CI Status

All 24 checks are now passing (1 skipped):

  • ✅ pre-commit (was failing, now fixed)
  • ✅ sdk-tests
  • ✅ tools-tests
  • ✅ agent-server-tests
  • ✅ cross-tests
  • ✅ check-examples
  • ✅ All Build & Push jobs
  • ✅ All other checks

The commit has been pushed to the openhands/add-execute-tool-method branch: #1833

View full conversation

@xingyaoww xingyaoww marked this pull request as ready for review February 4, 2026 14:38
@xingyaoww
Copy link
Collaborator Author

@OpenHands please do /codereview-roasted and uv run examples/01_standalone_sdk/30_tom_agent.py example using LLM_API_KEY in your env vars. Then tell me, if the changes in this PR gets it to work?

@openhands-ai
Copy link

openhands-ai bot commented Feb 4, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 4, 2026

Summary

I ran the 30_tom_agent.py example and performed a Linus Torvalds-style code review of PR #1833.

✅ Does the PR Fix the Issue?

Yes, the PR successfully fixes the issue! The example ran to completion:

  1. execute_tool worked correctly - It initialized the agent and executed sleeptime_compute before conversation.run() was called
  2. The conversation ran successfully - The Tom agent provided guidance and the example completed with EXAMPLE_COST: 0.0395

Note: The example requires LLM_BASE_URL to be set (without trailing slash) when using the LiteLLM proxy:

LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" uv run examples/01_standalone_sdk/30_tom_agent.py

🔥 Linus Torvalds-Style Code Review

Taste Rating: 🟢 Good Taste - Elegant, simple solution

This PR demonstrates good engineering judgment. The problem was clear (accessing tools_map before agent initialization), and the solution is appropriately minimal.


[CRITICAL ISSUES] - None found

The core design is sound. The execute_tool method:

  • Properly encapsulates agent initialization via _ensure_agent_ready()
  • Provides a clean API that hides complexity from users
  • Follows the existing patterns in the codebase

[IMPROVEMENT OPPORTUNITIES]

[remote_conversation.py, Lines 17-20] 🔧 Import Organization
The TYPE_CHECKING import block is placed awkwardly between regular imports. Should be grouped with other typing imports at the top:

from typing import TYPE_CHECKING, SupportsIndex, overload
# ... other imports ...
if TYPE_CHECKING:
    from openhands.sdk.tool.schema import Action, Observation

[local_conversation.py, Line 897-906] 💡 Error Message Enhancement
The KeyError message is good but could include a hint about using execute_tool before run():

raise KeyError(
    f"Tool '{tool_name}' not found. Available tools: {available_tools}. "
    f"Note: execute_tool() can be called before run() to execute tools early."
)

[STYLE NOTES]

[test_execute_tool.py]Good Test Coverage
The tests are comprehensive and cover:

  • Basic functionality
  • Agent initialization
  • Before/after send_message
  • Tool not found error
  • Multiple calls
  • Conversation context passing

VERDICT: ✅ Worth Merging

Core logic is sound, the solution is elegant, and it solves a real problem. The minor import organization issue doesn't block merging.

KEY INSIGHT:

The PR correctly identifies that the real problem was premature access to internal state (tools_map), and solves it by providing a proper public API (execute_tool) that handles initialization automatically. This is good API design - hiding complexity while providing a clean interface.

View full conversation

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall solid implementation that solves the issue of executing tools before run(). The code is well-tested and the API design is clean. Found a few areas for improvement around error handling and documentation consistency.

Returns:
The observation returned by the tool execution

Raises:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The docstring mentions "Raises: NotImplementedError: If the tool has no executor", but this exception is never raised in the LocalConversation implementation. Either add the check in the implementation or remove this from the docstring for accuracy.

)

# Execute the tool
return tool(action, self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Consider adding a check to verify the tool has an executor before calling it, or clarify in the docstring what happens when executor is None. This would make the error behavior more explicit:

Suggested change
return tool(action, self)
# Execute the tool
if not tool.executor:
raise NotImplementedError(f"Tool '{tool_name}' has no executor")
return tool(action, self)

Comment on lines +74 to +87
# Using execute_tool allows running tools before conversation.run()
print("\nRunning sleeptime compute to index conversations...")
try:
sleeptime_result = conversation.execute_tool(
"sleeptime_compute", SleeptimeComputeAction()
)
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")
# Cast to the expected observation type for type-safe access
if isinstance(sleeptime_result, SleeptimeComputeObservation):
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")
else:
print(f"Result: {sleeptime_result.text}")
except KeyError as e:
print(f"Tool not available: {e}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: The error handling only catches KeyError, but tool execution itself could fail with other exceptions (e.g., if the tool logic raises an error). Consider catching a broader exception or documenting that tool execution errors will propagate:

Suggested change
# Using execute_tool allows running tools before conversation.run()
print("\nRunning sleeptime compute to index conversations...")
try:
sleeptime_result = conversation.execute_tool(
"sleeptime_compute", SleeptimeComputeAction()
)
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")
# Cast to the expected observation type for type-safe access
if isinstance(sleeptime_result, SleeptimeComputeObservation):
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")
else:
print(f"Result: {sleeptime_result.text}")
except KeyError as e:
print(f"Tool not available: {e}")
print("\nRunning sleeptime compute to index conversations...")
try:
sleeptime_result = conversation.execute_tool(
"sleeptime_compute", SleeptimeComputeAction()
)
# Cast to the expected observation type for type-safe access
if isinstance(sleeptime_result, SleeptimeComputeObservation):
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")
else:
print(f"Result: {sleeptime_result.text}")
except KeyError as e:
print(f"Tool not available: {e}")
except Exception as e:
print(f"Tool execution failed: {e}")

Comment on lines +80 to +85
# Cast to the expected observation type for type-safe access
if isinstance(sleeptime_result, SleeptimeComputeObservation):
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")
else:
print(f"Result: {sleeptime_result.text}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: The isinstance check with fallback to .text suggests uncertainty about the return type. Since execute_tool() returns Observation, all observations should have a .text attribute. Consider simplifying this unless there's a specific reason for the type checking:

Suggested change
# Cast to the expected observation type for type-safe access
if isinstance(sleeptime_result, SleeptimeComputeObservation):
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")
else:
print(f"Result: {sleeptime_result.text}")
if isinstance(sleeptime_result, SleeptimeComputeObservation):
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")

Or if you want to always show the text as fallback:

Suggested change
# Cast to the expected observation type for type-safe access
if isinstance(sleeptime_result, SleeptimeComputeObservation):
print(f"Result: {sleeptime_result.message}")
print(f"Sessions processed: {sleeptime_result.sessions_processed}")
else:
print(f"Result: {sleeptime_result.text}")
print(f"Result: {sleeptime_result.text}")
if isinstance(sleeptime_result, SleeptimeComputeObservation):
print(f"Sessions processed: {sleeptime_result.sessions_processed}")

# Verify conversation was passed (result should contain conversation ID)
assert "conv_id:" in result.text
assert isinstance(result, ExecuteToolTestObservation)
assert result.result == "context_test"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: Consider adding a test case for when a tool exists but has no executor (if that's a valid scenario). This would help verify the error handling is correct for that edge case.

@enyst
Copy link
Collaborator

enyst commented Feb 4, 2026

@OpenHands Do a /codereview-roasted on this PR diff.

@openhands-ai
Copy link

openhands-ai bot commented Feb 4, 2026

I'm on it! enyst can track my progress at all-hands.dev

@OpenHands OpenHands deleted a comment from openhands-ai bot Feb 4, 2026
@enyst
Copy link
Collaborator

enyst commented Feb 4, 2026

🟡 Taste Rating: Acceptable (but with some sharp edges)

You fixed the real problem (tools_map before init) without overengineering, so kudos. But the new API has a couple of “you’re gonna regret this later” quirks. Here’s the blunt review.

[CRITICAL ISSUES]

  • None. This won’t blow up production on merge.

[IMPROVEMENT OPPORTUNITIES]

  • [openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py, Lines 894–906] Bypasses confirmation/security flow: execute_tool() runs the tool directly, skipping the confirmation policy + security analyzer path the agent loop uses. That’s fine for power users, but it’s a silent behavioral change: destructive tools can now run with zero confirmation. At minimum, document that this method bypasses confirmation. Better: add an optional confirm: bool or pipe through the same confirmation policy used in Agent.step().

  • [tests/sdk/conversation/local/test_execute_tool.py, Lines 78–80 & 242–243] Global registry pollution: register_tool() is called at import time and never cleaned up. That means this test module permanently mutates the global tool registry, which can bleed into other tests and make order-dependence bugs. Use a fixture that snapshots/restores the registry, or register under a randomized name to avoid collisions.

  • [openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py, Lines 6–18] Import hygiene: The TYPE_CHECKING block is dropped mid‑import with extra blank lines. It’s minor, but it’s sloppy. Keep standard imports, then typing helpers, then local imports. You’re not in a Jupyter notebook—follow the style guide.

[STYLE NOTES]

  • [examples/01_standalone_sdk/30_tom_agent.py, Lines 71–86] The example now prints a generic “Result: …” for non‑SleeptimeComputeObservation results. That’s fine, but for a tutorial, you want deterministic output. Consider explicitly handling the Observation.text case with a clear label (e.g., Result text:) so readers don’t mistake it for the structured result.

VERDICT:

Worth merging once you acknowledge (or document) the confirmation bypass and stop leaking tools into the global registry from tests.

Key insight: You fixed the initialization bug, but the new API is a “sharp knife.” Make sure users understand it cuts straight through your safety rails.

@openhands-ai

This comment was marked as duplicate.

enyst added 2 commits February 4, 2026 19:25
Ensure test registry registrations are reset per test to avoid
cross-test pollution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

30_tom_agent.py example fails: tools_map accessed before agent initialization

4 participants