-
Notifications
You must be signed in to change notification settings - Fork 132
Add action summary feature for tool calls #1339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement optional 10-word summaries for agent actions, following the same pattern as the security risk analyzer feature. Changes: - Add summary field to ActionEvent with visualization support - Add enable_action_summaries flag to ConversationState - Create _extract_summary() method in Agent with validation - Update tool schema methods to support add_summary_prediction flag - Integrate summary extraction into agent step flow - Add comprehensive tests for all components The feature can be enabled by calling conversation.set_enable_action_summaries(True), which prompts the LLM to provide a brief summary with each tool call. Co-authored-by: openhands <[email protected]>
Allow users to enable action summaries when creating a conversation:
- Add enable_action_summaries parameter to LocalConversation.__init__()
- Add enable_action_summaries parameter to ConversationState.create()
- Update documentation and example to show both initialization and dynamic enabling
- Keep set_enable_action_summaries() method for dynamic toggling
Usage:
conversation = LocalConversation(
agent=agent,
workspace="/tmp",
enable_action_summaries=True
)
Co-authored-by: openhands <[email protected]>
Simplify configuration by removing the dynamic setter method. Action summaries are now configured only at initialization time via the enable_action_summaries parameter. Co-authored-by: openhands <[email protected]>
|
@OpenHands Review diff against main, and help me understand what the best and simplest way to implement this is. Do you think if we can just always enable this "summary" feature since it is just a few word and shouldn't hurt the cost too much? If so, please implement it and remove all the unnecessary plumbing |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Remove enable_action_summaries flag and associated plumbing to simplify the implementation. The summary field is now always requested from the LLM and injected into tool schemas, providing better agent transparency at minimal cost (~10-15 tokens per action). Key changes: - Remove enable_action_summaries from ConversationState - Remove add_summary_prediction parameter from tool.py, llm.py, router, agent - Remove enable_action_summaries from LocalConversation.__init__() - Summary is always injected into tool schemas - ActionEvent.summary can still be None if LLM doesn't provide it - All tests updated to include summary field in mock tool calls Co-authored-by: openhands <[email protected]>
SummaryI've successfully simplified the action summary feature by always enabling the summary field and removing all unnecessary configuration plumbing. Analysis & DecisionQuestion answered: Yes, we should always enable the summary feature! The summary field adds only ~10-15 tokens per action (a few words) which is minimal cost compared to the high explainability value it provides. This makes the feature much simpler - no need for complex configuration. Changes MadeSimplified Implementation:
Quality Assurance:
ResultThe implementation is now much cleaner with:
The changes have been committed and pushed to the |
The summary field is always requested from LLM for transparency, but if the LLM doesn't provide it or provides invalid data, we gracefully handle it by returning None instead of raising errors. This approach: - Minimizes changes to existing tests (no need to add summary to all mocks) - Makes the feature robust (doesn't break if LLM forgets summary) - Still provides transparency when LLM does provide summaries - Follows the principle of 'request but don't require' Changes: - _extract_summary now returns str | None and never raises errors - summary field in tool schema has default=None (optional) - Updated only 3 test files to account for summary field - Reverted changes to 11 test files that no longer need summary Co-authored-by: openhands <[email protected]>
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Addressed feedback to provide concrete examples of what good action summaries should look like. This helps guide LLM models in generating appropriate, concise summaries that are action-oriented and specific. Examples include: - editing configuration file for deployment settings - searching codebase for authentication function definitions - installing required dependencies from package manifest - running tests to verify bug fix - viewing directory structure to locate source files Co-authored-by: openhands <[email protected]>
When the LLM doesn't provide a summary (or provides invalid data),
we now generate a default summary in the format:
{tool_name}: {arguments}
This ensures transparency even when the LLM doesn't explicitly
provide a summary, making it easier for users to understand
what action is being taken.
Changes:
- Modified _extract_summary to accept tool_name parameter
- Generate default summary using tool name and JSON-serialized arguments
- Updated all tests to reflect new behavior
- All tests passing (107 agent tests, 10 extract_summary tests)
Co-authored-by: openhands <[email protected]>
The assertion checking if 'summary' was removed from arguments is redundant because _extract_summary always uses pop() to remove it. Co-authored-by: openhands <[email protected]>
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
2 similar comments
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
@OpenHands please merge from main, resolve all the conflict, and make this PR introduce as little change as possible but still implement the same feature |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Resolved conflicts in: - openhands-sdk/openhands/sdk/agent/utils.py - openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py - openhands-sdk/openhands/sdk/tool/tool.py Co-authored-by: openhands <[email protected]>
SummaryI successfully merged Merge Conflicts Resolved
PR Changes SummaryThe PR introduces a minimal action summary feature with 12 files changed (+487/-17 lines):
Verification
The PR is now up-to-date with main and ready for review. |
- Add verify_all_actions_have_summary() utility function in behavior_utils.py - Update b04_each_tool_call_has_a_concise_explanation.py to check summary field presence - Add unit tests for the new utility function Co-authored-by: openhands <[email protected]>
|
@OpenHands review diff against main and remove unnecesary tests |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Simplified test files by removing redundant tests: - test_extract_summary.py: Removed duplicate tests covered by parametrized test - test_action_event_summary.py: Removed tests for Pydantic built-in functionality - test_to_responses_tool_summary.py: Removed redundant security+summary combo tests Reduced from 24 tests to 9 tests while maintaining full coverage of the actual implementation logic. Co-authored-by: openhands <[email protected]>
SummaryI reviewed the diff against main and removed unnecessary tests from the action summary feature. The changes have been pushed to the Changes MadeSimplified the test files by removing redundant tests:
Result
All tests pass and pre-commit hooks pass. The reduced test suite still provides full coverage of the actual implementation logic while following the repository guidelines to avoid testing Pydantic's built-in functionality. |
|
@OpenHands help me fix the test: TOTAL 14770 7079 52% |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
|
@OpenHands help me open a new PR in OpenHands/OpenHands-CLI that will use this branch of SDK, and use the summary from the action to replace the current summary of action generated by string truncation heuristic |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
SummaryI've successfully created a new PR in OpenHands/OpenHands-CLI that uses the action summary feature from the SDK's PR Created: OpenHands/OpenHands-CLI#303 Changes Made
Checklist
NoteThis PR is created as a draft since it depends on OpenHands/software-agent-sdk PR #1339 being merged first. Once that PR is merged, the SDK dependency should be updated to use the released version. |
SummaryThe test The test failure you reported appears to have been a transient issue that has since been resolved. The test validates that:
I also ran the full SDK and cross test suites (1706 tests) and all tests pass. No code changes were needed - the branch |
|
@OpenHands please fix the failing actions on PR #1339 at branch |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
| ActionT = TypeVar("ActionT", bound=Action) | ||
| ObservationT = TypeVar("ObservationT", bound=Observation) | ||
| _action_types_with_risk: dict[type, type] = {} | ||
| _action_types_with_summary: dict[type, type] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice with the caching!
malhotra5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The MCP tests were timing out in CI because uvx may need to download packages on first run. Increased timeout from default 30s to 120s. Co-authored-by: openhands <[email protected]>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
🧪 Integration Tests ResultsOverall Success Rate: 98.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_gpt_5.1_codex_max
Skipped Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
|
Summary
This PR implements an optional action summary feature that allows the LLM to provide brief (~10 word) summaries with each tool call, describing what the agent is doing. The implementation follows the same pattern as the existing security risk analyzer feature.
Motivation
Agent actions can sometimes be opaque to users and developers. This feature provides transparency by adding human-readable summaries to each action, making it easier to:
Changes
Core Implementation
ActionEvent Enhancement (
action.py)summary: str | Nonefield to store summariesvisualizeproperty to display summary when presentConversation Configuration
enable_action_summaries: boolfield toConversationState(default:False)LocalConversation.__init__()for initialization-time configurationTool Schema Enhancement (
tool.py)_create_action_type_with_summary()helper functionadd_summary_prediction: boolparameter to schema methodsLLM Integration (
llm.py,router/base.py,agent/utils.py)add_summary_prediction: boolparameter throughout the LLM pipelineAgent Processing (
agent/agent.py)_extract_summary()method with comprehensive validationTesting
Added 24 comprehensive tests across 3 test files:
test_extract_summary.py)test_to_responses_tool_summary.py)test_action_event_summary.py)All tests pass and all pre-commit hooks pass (ruff, pyright, pycodestyle, etc.).
Usage
Enable action summaries at conversation creation:
Example Output
When enabled, ActionEvents include summaries in their visualization:
Design Decisions
Pattern Consistency: Follows the exact pattern used by the security risk analyzer for consistency and maintainability
Backward Compatibility: Feature is disabled by default; no breaking changes to existing APIs
Validation Strategy: Strict validation when enabled (must be present, must be string, must not be empty)
Configuration Simplicity: Set at initialization time only - no dynamic toggling needed
Composability: Works alongside security risk analysis - both fields can be present simultaneously
Comparison with Security Risk Feature
SecurityRiskenumadd_security_risk_predictionadd_summary_predictionSecurityAnalyzerenable_action_summariesFiles Changed
openhands-sdk/openhands/sdk/event/llm_convertible/action.pyopenhands-sdk/openhands/sdk/conversation/state.pyopenhands-sdk/openhands/sdk/conversation/base.pyopenhands-sdk/openhands/sdk/conversation/impl/local_conversation.pyopenhands-sdk/openhands/sdk/tool/tool.pyopenhands-sdk/openhands/sdk/llm/llm.pyopenhands-sdk/openhands/sdk/llm/router/base.pyopenhands-sdk/openhands/sdk/agent/agent.pyopenhands-sdk/openhands/sdk/agent/utils.pytests/sdk/agent/test_extract_summary.py(new)tests/sdk/tool/test_to_responses_tool_summary.py(new)tests/sdk/event/test_action_event_summary.py(new)IMPLEMENTATION_SUMMARY.md(new - internal documentation)Testing Checklist
Future Enhancements
Potential improvements for future iterations:
Co-authored-by: openhands [email protected]
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:90574b1-pythonRun
All tags pushed for this build
About Multi-Architecture Support
90574b1-python) is a multi-arch manifest supporting both amd64 and arm6490574b1-python-amd64) are also available if needed