Add conversation.execute_tool() method for pre-run tool execution#1833
Add conversation.execute_tool() method for pre-run tool execution#1833
Conversation
This commit adds a new execute_tool(tool_name, action) method to the Conversation class that allows executing tools before conversation.run() is called. This is useful for: - Pre-run setup operations (e.g., indexing repositories) - Manual tool execution for environment setup - Testing tool behavior outside the agent loop Changes: - Add abstract execute_tool method to BaseConversation - Implement execute_tool in LocalConversation with automatic agent initialization - Add NotImplementedError stub in RemoteConversation (not yet supported) - Update 30_tom_agent.py example to use the new execute_tool method - Add comprehensive tests for execute_tool functionality Fixes #1831 Co-authored-by: openhands <[email protected]>
|
Note: The |
|
@OpenHands please fix the failing actions on PR #1833 at branch |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
|
@OpenHands /codereview-roasted |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Add the execute_tool method to MockConversation in test_base_span_management.py to fix pyright type checking errors. The abstract method was added to BaseConversation but the test mock class was missing the implementation. Co-authored-by: openhands <[email protected]>
Code Review Summary for PR #1833I performed a Linus Torvalds-style "roasted" code review of the 🔴 Taste Rating: Needs Improvement (but worth merging after fixes)Critical Issue Found:[remote_conversation.py, Lines 17-20] - The Improvement Opportunities:
What's Good:
Verdict: ✅ Worth merging after addressing the import organization issue in Key Insight: The fundamental design is sound - lazy initialization solves the actual user problem without over-engineering. Just needs cleanup of rough edges. No code changes were made - this was a review-only request. |
Coverage Report •
|
|||||||||||||||||||||||||
SummaryI fixed the failing GitHub Actions on PR #1833 by addressing the pre-commit check failure. IssueThe FixAdded the missing def execute_tool(self, tool_name: str, action: Action) -> Observation:
"""Mock implementation of execute_tool method."""
raise NotImplementedError("Mock execute_tool not implemented")Also added the necessary import: Changes
CI StatusAll 24 checks are now passing (1 skipped):
The commit has been pushed to the |
|
@OpenHands please do /codereview-roasted and |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
SummaryI ran the ✅ Does the PR Fix the Issue?Yes, the PR successfully fixes the issue! The example ran to completion:
Note: The example requires LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" uv run examples/01_standalone_sdk/30_tom_agent.py🔥 Linus Torvalds-Style Code ReviewTaste Rating: 🟢 Good Taste - Elegant, simple solutionThis PR demonstrates good engineering judgment. The problem was clear (accessing [CRITICAL ISSUES] - None foundThe core design is sound. The
[IMPROVEMENT OPPORTUNITIES][remote_conversation.py, Lines 17-20] 🔧 Import Organization from typing import TYPE_CHECKING, SupportsIndex, overload
# ... other imports ...
if TYPE_CHECKING:
from openhands.sdk.tool.schema import Action, Observation[local_conversation.py, Line 897-906] 💡 Error Message Enhancement raise KeyError(
f"Tool '{tool_name}' not found. Available tools: {available_tools}. "
f"Note: execute_tool() can be called before run() to execute tools early."
)[STYLE NOTES][test_execute_tool.py] ✅ Good Test Coverage
VERDICT: ✅ Worth MergingCore logic is sound, the solution is elegant, and it solves a real problem. The minor import organization issue doesn't block merging. KEY INSIGHT:The PR correctly identifies that the real problem was premature access to internal state ( |
all-hands-bot
left a comment
There was a problem hiding this comment.
Overall solid implementation that solves the issue of executing tools before run(). The code is well-tested and the API design is clean. Found a few areas for improvement around error handling and documentation consistency.
| Returns: | ||
| The observation returned by the tool execution | ||
|
|
||
| Raises: |
There was a problem hiding this comment.
🟡 Suggestion: The docstring mentions "Raises: NotImplementedError: If the tool has no executor", but this exception is never raised in the LocalConversation implementation. Either add the check in the implementation or remove this from the docstring for accuracy.
| ) | ||
|
|
||
| # Execute the tool | ||
| return tool(action, self) |
There was a problem hiding this comment.
🟡 Suggestion: Consider adding a check to verify the tool has an executor before calling it, or clarify in the docstring what happens when executor is None. This would make the error behavior more explicit:
| return tool(action, self) | |
| # Execute the tool | |
| if not tool.executor: | |
| raise NotImplementedError(f"Tool '{tool_name}' has no executor") | |
| return tool(action, self) |
| # Using execute_tool allows running tools before conversation.run() | ||
| print("\nRunning sleeptime compute to index conversations...") | ||
| try: | ||
| sleeptime_result = conversation.execute_tool( | ||
| "sleeptime_compute", SleeptimeComputeAction() | ||
| ) | ||
| print(f"Result: {sleeptime_result.message}") | ||
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") | ||
| # Cast to the expected observation type for type-safe access | ||
| if isinstance(sleeptime_result, SleeptimeComputeObservation): | ||
| print(f"Result: {sleeptime_result.message}") | ||
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") | ||
| else: | ||
| print(f"Result: {sleeptime_result.text}") | ||
| except KeyError as e: | ||
| print(f"Tool not available: {e}") |
There was a problem hiding this comment.
🟠 Important: The error handling only catches KeyError, but tool execution itself could fail with other exceptions (e.g., if the tool logic raises an error). Consider catching a broader exception or documenting that tool execution errors will propagate:
| # Using execute_tool allows running tools before conversation.run() | |
| print("\nRunning sleeptime compute to index conversations...") | |
| try: | |
| sleeptime_result = conversation.execute_tool( | |
| "sleeptime_compute", SleeptimeComputeAction() | |
| ) | |
| print(f"Result: {sleeptime_result.message}") | |
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") | |
| # Cast to the expected observation type for type-safe access | |
| if isinstance(sleeptime_result, SleeptimeComputeObservation): | |
| print(f"Result: {sleeptime_result.message}") | |
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") | |
| else: | |
| print(f"Result: {sleeptime_result.text}") | |
| except KeyError as e: | |
| print(f"Tool not available: {e}") | |
| print("\nRunning sleeptime compute to index conversations...") | |
| try: | |
| sleeptime_result = conversation.execute_tool( | |
| "sleeptime_compute", SleeptimeComputeAction() | |
| ) | |
| # Cast to the expected observation type for type-safe access | |
| if isinstance(sleeptime_result, SleeptimeComputeObservation): | |
| print(f"Result: {sleeptime_result.message}") | |
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") | |
| else: | |
| print(f"Result: {sleeptime_result.text}") | |
| except KeyError as e: | |
| print(f"Tool not available: {e}") | |
| except Exception as e: | |
| print(f"Tool execution failed: {e}") |
| # Cast to the expected observation type for type-safe access | ||
| if isinstance(sleeptime_result, SleeptimeComputeObservation): | ||
| print(f"Result: {sleeptime_result.message}") | ||
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") | ||
| else: | ||
| print(f"Result: {sleeptime_result.text}") |
There was a problem hiding this comment.
🟢 Nit: The isinstance check with fallback to .text suggests uncertainty about the return type. Since execute_tool() returns Observation, all observations should have a .text attribute. Consider simplifying this unless there's a specific reason for the type checking:
| # Cast to the expected observation type for type-safe access | |
| if isinstance(sleeptime_result, SleeptimeComputeObservation): | |
| print(f"Result: {sleeptime_result.message}") | |
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") | |
| else: | |
| print(f"Result: {sleeptime_result.text}") | |
| if isinstance(sleeptime_result, SleeptimeComputeObservation): | |
| print(f"Result: {sleeptime_result.message}") | |
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") |
Or if you want to always show the text as fallback:
| # Cast to the expected observation type for type-safe access | |
| if isinstance(sleeptime_result, SleeptimeComputeObservation): | |
| print(f"Result: {sleeptime_result.message}") | |
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") | |
| else: | |
| print(f"Result: {sleeptime_result.text}") | |
| print(f"Result: {sleeptime_result.text}") | |
| if isinstance(sleeptime_result, SleeptimeComputeObservation): | |
| print(f"Sessions processed: {sleeptime_result.sessions_processed}") |
| # Verify conversation was passed (result should contain conversation ID) | ||
| assert "conv_id:" in result.text | ||
| assert isinstance(result, ExecuteToolTestObservation) | ||
| assert result.result == "context_test" |
There was a problem hiding this comment.
🟢 Nit: Consider adding a test case for when a tool exists but has no executor (if that's a valid scenario). This would help verify the error handling is correct for that edge case.
|
@OpenHands Do a /codereview-roasted on this PR diff. |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
🟡 Taste Rating: Acceptable (but with some sharp edges) You fixed the real problem (tools_map before init) without overengineering, so kudos. But the new API has a couple of “you’re gonna regret this later” quirks. Here’s the blunt review. [CRITICAL ISSUES]
[IMPROVEMENT OPPORTUNITIES]
[STYLE NOTES]
VERDICT:✅ Worth merging once you acknowledge (or document) the confirmation bypass and stop leaking tools into the global registry from tests. Key insight: You fixed the initialization bug, but the new API is a “sharp knife.” Make sure users understand it cuts straight through your safety rails. |
This comment was marked as duplicate.
This comment was marked as duplicate.
Ensure test registry registrations are reset per test to avoid cross-test pollution.
Summary
This PR adds a new
execute_tool(tool_name, action)method to the Conversation class that allows executing tools beforeconversation.run()is called. This addresses issue #1831 where the30_tom_agent.pyexample was failing because it tried to accesstools_mapbefore the agent was initialized.The new method:
run()callChanges
execute_toolmethod toBaseConversationexecute_toolinLocalConversationwith automatic agent initializationNotImplementedErrorstub inRemoteConversation(not yet supported for remote conversations)30_tom_agent.pyexample to use the newexecute_toolmethodexecute_toolfunctionalityExample Usage
Fixes #1831
Checklist
@xingyaoww can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:e70174f-pythonRun
All tags pushed for this build
About Multi-Architecture Support
e70174f-python) is a multi-arch manifest supporting both amd64 and arm64e70174f-python-amd64) are also available if needed