Add action summary feature for tool calls #1339

xingyaoww · 2025-12-05T21:16:29Z

Summary

This PR implements an optional action summary feature that allows the LLM to provide brief (~10 word) summaries with each tool call, describing what the agent is doing. The implementation follows the same pattern as the existing security risk analyzer feature.

Motivation

Agent actions can sometimes be opaque to users and developers. This feature provides transparency by adding human-readable summaries to each action, making it easier to:

Track what the agent is doing at each step
Debug agent behavior
Understand agent decisions
Maintain audit trails
Improve user experience

Changes

Core Implementation

ActionEvent Enhancement (action.py)
- Added summary: str | None field to store summaries
- Updated visualize property to display summary when present
Conversation Configuration
- Added enable_action_summaries: bool field to ConversationState (default: False)
- Added parameter to LocalConversation.__init__() for initialization-time configuration
- Configuration is set at conversation creation time only
Tool Schema Enhancement (tool.py)
- Created _create_action_type_with_summary() helper function
- Added add_summary_prediction: bool parameter to schema methods
- Dynamically injects summary field into tool schemas when enabled
LLM Integration (llm.py, router/base.py, agent/utils.py)
- Added add_summary_prediction: bool parameter throughout the LLM pipeline
- Propagates the flag from conversation state to tool schemas
Agent Processing (agent/agent.py)
- Created _extract_summary() method with comprehensive validation
- Validates summary is a string, not empty, and present when required
- Integrated summary extraction into the action event workflow

Testing

Added 24 comprehensive tests across 3 test files:

✅ 13 tests for summary extraction and validation (test_extract_summary.py)
✅ 6 tests for tool schema enhancement (test_to_responses_tool_summary.py)
✅ 5 tests for ActionEvent integration (test_action_event_summary.py)

All tests pass and all pre-commit hooks pass (ruff, pyright, pycodestyle, etc.).

Usage

Enable action summaries at conversation creation:

conversation = LocalConversation(
    agent=agent,
    workspace="/tmp",
    enable_action_summaries=True  # Enable at initialization
)

Example Output

When enabled, ActionEvents include summaries in their visualization:

🛠️  ACTION (agent)
Tool: str_replace_editor
Summary: editing configuration file for deployment settings
Security: MEDIUM
Arguments: {...}

Design Decisions

Pattern Consistency: Follows the exact pattern used by the security risk analyzer for consistency and maintainability
Backward Compatibility: Feature is disabled by default; no breaking changes to existing APIs
Validation Strategy: Strict validation when enabled (must be present, must be string, must not be empty)
Configuration Simplicity: Set at initialization time only - no dynamic toggling needed
Composability: Works alongside security risk analysis - both fields can be present simultaneously

Comparison with Security Risk Feature

Aspect	Security Risk	Action Summary
Field Type	`SecurityRisk` enum	`str
Flag Name	`add_security_risk_prediction`	`add_summary_prediction`
State Field	Via `SecurityAnalyzer`	`enable_action_summaries`
Validation	Enum validation	String type + content validation
Can Combine	✅ Yes - both fields can be present

Files Changed

openhands-sdk/openhands/sdk/event/llm_convertible/action.py
openhands-sdk/openhands/sdk/conversation/state.py
openhands-sdk/openhands/sdk/conversation/base.py
openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
openhands-sdk/openhands/sdk/tool/tool.py
openhands-sdk/openhands/sdk/llm/llm.py
openhands-sdk/openhands/sdk/llm/router/base.py
openhands-sdk/openhands/sdk/agent/agent.py
openhands-sdk/openhands/sdk/agent/utils.py
tests/sdk/agent/test_extract_summary.py (new)
tests/sdk/tool/test_to_responses_tool_summary.py (new)
tests/sdk/event/test_action_event_summary.py (new)
IMPLEMENTATION_SUMMARY.md (new - internal documentation)

Testing Checklist

Future Enhancements

Potential improvements for future iterations:

Enforce 10-word limit in validation
Add summary quality metrics
Support multi-language summaries
Integration with telemetry/monitoring systems

Co-authored-by: openhands [email protected]

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:90574b1-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-90574b1-python \
  ghcr.io/openhands/agent-server:90574b1-python

All tags pushed for this build

ghcr.io/openhands/agent-server:90574b1-golang-amd64
ghcr.io/openhands/agent-server:90574b1-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:90574b1-golang-arm64
ghcr.io/openhands/agent-server:90574b1-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:90574b1-java-amd64
ghcr.io/openhands/agent-server:90574b1-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:90574b1-java-arm64
ghcr.io/openhands/agent-server:90574b1-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:90574b1-python-amd64
ghcr.io/openhands/agent-server:90574b1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:90574b1-python-arm64
ghcr.io/openhands/agent-server:90574b1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:90574b1-golang
ghcr.io/openhands/agent-server:90574b1-java
ghcr.io/openhands/agent-server:90574b1-python

About Multi-Architecture Support

Each variant tag (e.g., 90574b1-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 90574b1-python-amd64) are also available if needed

Implement optional 10-word summaries for agent actions, following the same pattern as the security risk analyzer feature. Changes: - Add summary field to ActionEvent with visualization support - Add enable_action_summaries flag to ConversationState - Create _extract_summary() method in Agent with validation - Update tool schema methods to support add_summary_prediction flag - Integrate summary extraction into agent step flow - Add comprehensive tests for all components The feature can be enabled by calling conversation.set_enable_action_summaries(True), which prompts the LLM to provide a brief summary with each tool call. Co-authored-by: openhands <[email protected]>

Allow users to enable action summaries when creating a conversation: - Add enable_action_summaries parameter to LocalConversation.__init__() - Add enable_action_summaries parameter to ConversationState.create() - Update documentation and example to show both initialization and dynamic enabling - Keep set_enable_action_summaries() method for dynamic toggling Usage: conversation = LocalConversation( agent=agent, workspace="/tmp", enable_action_summaries=True ) Co-authored-by: openhands <[email protected]>

Simplify configuration by removing the dynamic setter method. Action summaries are now configured only at initialization time via the enable_action_summaries parameter. Co-authored-by: openhands <[email protected]>

openhands-sdk/openhands/sdk/event/llm_convertible/action.py

xingyaoww · 2025-12-05T21:40:09Z

@OpenHands Review diff against main, and help me understand what the best and simplest way to implement this is. Do you think if we can just always enable this "summary" feature since it is just a few word and shouldn't hurt the cost too much?

If so, please implement it and remove all the unnecessary plumbing

openhands-ai · 2025-12-05T21:40:20Z

I'm on it! xingyaoww can track my progress at all-hands.dev

openhands-sdk/openhands/sdk/agent/agent.py

Remove enable_action_summaries flag and associated plumbing to simplify the implementation. The summary field is now always requested from the LLM and injected into tool schemas, providing better agent transparency at minimal cost (~10-15 tokens per action). Key changes: - Remove enable_action_summaries from ConversationState - Remove add_summary_prediction parameter from tool.py, llm.py, router, agent - Remove enable_action_summaries from LocalConversation.__init__() - Summary is always injected into tool schemas - ActionEvent.summary can still be None if LLM doesn't provide it - All tests updated to include summary field in mock tool calls Co-authored-by: openhands <[email protected]>

openhands-ai · 2025-12-05T22:26:18Z

Summary

I've successfully simplified the action summary feature by always enabling the summary field and removing all unnecessary configuration plumbing.

Analysis & Decision

Question answered: Yes, we should always enable the summary feature!

The summary field adds only ~10-15 tokens per action (a few words) which is minimal cost compared to the high explainability value it provides. This makes the feature much simpler - no need for complex configuration.

Changes Made

Simplified Implementation:

✅ Removed configuration flag - enable_action_summaries removed from ConversationState
✅ Removed parameter plumbing - add_summary_prediction parameter removed from:
- tool.py
- llm.py
- router/base.py
- agent/agent.py
- agent/utils.py
- LocalConversation.__init__()
✅ Summary always injected - Tool schemas now always include the summary field
✅ Maintained flexibility - ActionEvent.summary can still be None if LLM doesn't provide it
✅ Updated all tests - Modified 13 test files to include summary field in mock tool calls
✅ Removed internal docs - Deleted IMPLEMENTATION_SUMMARY.md

Quality Assurance:

✅ All 1352 tests passing
✅ All pre-commit checks passing (ruff, pyright, pycodestyle)
✅ Fixed all line length violations

Result

The implementation is now much cleaner with:

Less complexity - No configuration flags or conditional logic
Better transparency - Summary always available for every action
Minimal cost - ~10-15 tokens per action
Same functionality - Core feature preserved, just simplified

The changes have been committed and pushed to the feature/action-summary branch, updating PR #1339.

_{View full conversation}

The summary field is always requested from LLM for transparency, but if the LLM doesn't provide it or provides invalid data, we gracefully handle it by returning None instead of raising errors. This approach: - Minimizes changes to existing tests (no need to add summary to all mocks) - Makes the feature robust (doesn't break if LLM forgets summary) - Still provides transparency when LLM does provide summaries - Follows the principle of 'request but don't require' Changes: - _extract_summary now returns str | None and never raises errors - summary field in tool schema has default=None (optional) - Updated only 3 test files to account for summary field - Reverted changes to 11 test files that no longer need summary Co-authored-by: openhands <[email protected]>

github-actions · 2025-12-05T22:50:26Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/agent
agent.py	206	72	65%	89, 93, 144, 148–149, 157–159, 169–170, 186–188, 195–197, 199, 203, 206–207, 209–210, 228, 255, 260, 271, 310, 315, 326, 329, 352, 362–363, 383, 408–410, 412, 424–425, 430–431, 453–454, 459, 471–472, 477–478, 517–518, 524–525, 529, 537–538, 546, 549–551, 556–557, 578, 585, 589–590, 628–630, 633–634, 638
utils.py	57	18	68%	63, 77, 83–84, 101–102, 105–107, 110, 168, 170–172, 174–175, 182, 217
openhands-sdk/openhands/sdk/event/llm_convertible
action.py	64	15	76%	88, 98–100, 112–118, 125–126, 155–156
openhands-sdk/openhands/sdk/llm
llm.py	402	153	61%	344, 349, 353, 357–358, 361, 365–366, 377–378, 380–381, 385, 402, 420–423, 512–514, 535, 539, 554, 560–561, 585–586, 596, 621–626, 661–662, 665, 669, 681, 686–689, 698, 706–713, 717–720, 722, 735, 739–740, 742–743, 748–749, 751, 758, 761–766, 823–828, 885–886, 889–892, 934, 951, 1005, 1008, 1011–1019, 1023–1025, 1028, 1031–1033, 1040–1041, 1050, 1057–1059, 1063, 1065–1070, 1072–1089, 1092–1096, 1098–1099, 1105–1114
openhands-sdk/openhands/sdk/llm/router
base.py	42	17	59%	44–45, 48, 76–77, 79, 82, 109–111, 115, 121–123, 126–127, 129
openhands-sdk/openhands/sdk/tool
tool.py	146	27	81%	223, 255, 265, 280, 306, 315, 318–324, 341, 346–349, 351, 358–360, 432, 461, 464, 468, 476
TOTAL	14896	7151	51%

openhands-sdk/openhands/sdk/agent/agent.py

Addressed feedback to provide concrete examples of what good action summaries should look like. This helps guide LLM models in generating appropriate, concise summaries that are action-oriented and specific. Examples include: - editing configuration file for deployment settings - searching codebase for authentication function definitions - installing required dependencies from package manifest - running tests to verify bug fix - viewing directory structure to locate source files Co-authored-by: openhands <[email protected]>

When the LLM doesn't provide a summary (or provides invalid data), we now generate a default summary in the format: {tool_name}: {arguments} This ensures transparency even when the LLM doesn't explicitly provide a summary, making it easier for users to understand what action is being taken. Changes: - Modified _extract_summary to accept tool_name parameter - Generate default summary using tool name and JSON-serialized arguments - Updated all tests to reflect new behavior - All tests passing (107 agent tests, 10 extract_summary tests) Co-authored-by: openhands <[email protected]>

The assertion checking if 'summary' was removed from arguments is redundant because _extract_summary always uses pop() to remove it. Co-authored-by: openhands <[email protected]>

blacksmith-sh · 2025-12-21T12:55:33Z

[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

blacksmith-sh · 2025-12-21T12:56:57Z

[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

all-hands-bot · 2026-01-05T12:18:57Z

[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

xingyaoww · 2026-01-05T15:39:11Z

@OpenHands please merge from main, resolve all the conflict, and make this PR introduce as little change as possible but still implement the same feature

openhands-ai · 2026-01-05T15:39:21Z

I'm on it! xingyaoww can track my progress at all-hands.dev

Resolved conflicts in: - openhands-sdk/openhands/sdk/agent/utils.py - openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py - openhands-sdk/openhands/sdk/tool/tool.py Co-authored-by: openhands <[email protected]>

openhands-ai · 2026-01-05T15:47:40Z

Summary

I successfully merged origin/main into the feature/action-summary branch and resolved all merge conflicts. The changes have been pushed to the remote branch.

Merge Conflicts Resolved

openhands-sdk/openhands/sdk/agent/utils.py: Combined docstring notes from both branches - kept main's security_risk documentation and added the summary field note.
openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py: Merged docstring parameters - kept main's stuck_detection_thresholds documentation and the feature branch's secrets parameter.
openhands-sdk/openhands/sdk/tool/tool.py: Fixed function call from _create_action_type_with_risk (private) to create_action_type_with_risk (public function used by other modules).

PR Changes Summary

The PR introduces a minimal action summary feature with 12 files changed (+487/-17 lines):

Core feature: Adds a summary field to ActionEvent that provides a brief (~10 word) description of what each action does
Always enabled: Summary field is always added to tool schemas for transparency (no configuration needed)
Graceful fallback: If LLM doesn't provide a summary, a default is generated in format {tool_name}: {arguments}
Visualization: Summary is displayed in action event visualization when present

Verification

✅ All merge conflicts resolved
✅ Pre-commit hooks pass (Ruff format, lint, pycodestyle, pyright)
✅ All 19 summary-related tests pass
✅ Tool definition tests pass (26 tests)
✅ Tool execution error handling tests pass (2 tests)
✅ Changes pushed to origin/feature/action-summary

The PR is now up-to-date with main and ready for review.

View full conversation

- Add verify_all_actions_have_summary() utility function in behavior_utils.py - Update b04_each_tool_call_has_a_concise_explanation.py to check summary field presence - Add unit tests for the new utility function Co-authored-by: openhands <[email protected]>

xingyaoww · 2026-01-07T17:06:05Z

@OpenHands review diff against main and remove unnecesary tests

openhands-ai · 2026-01-07T17:06:14Z

I'm on it! xingyaoww can track my progress at all-hands.dev

Simplified test files by removing redundant tests: - test_extract_summary.py: Removed duplicate tests covered by parametrized test - test_action_event_summary.py: Removed tests for Pydantic built-in functionality - test_to_responses_tool_summary.py: Removed redundant security+summary combo tests Reduced from 24 tests to 9 tests while maintaining full coverage of the actual implementation logic. Co-authored-by: openhands <[email protected]>

openhands-ai · 2026-01-07T17:12:11Z

Summary

I reviewed the diff against main and removed unnecessary tests from the action summary feature. The changes have been pushed to the feature/action-summary branch.

Changes Made

Simplified the test files by removing redundant tests:

tests/sdk/agent/test_extract_summary.py:
- Removed test_extract_summary_arguments_mutation (covered by parametrized test)
- Removed test_extract_summary_default_generation (covered by parametrized test)
- Reduced from 3 test functions to 1 parametrized test with 5 cases
tests/sdk/event/test_action_event_summary.py:
- Removed test_action_event_with_summary (tests Pydantic field assignment)
- Removed test_action_event_without_summary (tests Pydantic default)
- Removed test_action_event_summary_serialization (tests Pydantic's built-in model_dump/model_validate)
- Kept only the 2 visualization tests that test actual implementation logic
tests/sdk/tool/test_to_responses_tool_summary.py:
- Removed test_to_responses_tool_summary_and_security (redundant combo test)
- Removed test_to_openai_tool_summary_and_security (redundant combo test)
- Kept only the 2 essential tests for summary field presence

Result

Before: 24 tests across 3 files
After: 9 tests across 3 files (5 parametrized + 2 visualization + 2 schema tests)

All tests pass and pre-commit hooks pass. The reduced test suite still provides full coverage of the actual implementation logic while following the repository guidelines to avoid testing Pydantic's built-in functionality.

View full conversation

xingyaoww · 2026-01-07T17:15:26Z

@OpenHands help me fix the test:

TOTAL 14770 7079 52%
=========================== short test summary info ============================
FAILED tests/cross/test_remote_conversation_live_server.py::test_security_risk_field_with_live_server - AssertionError: Expected to find ActionEvent with finish tool even without security_risk
assert False
================== 1 failed, 53 passed, 14 warnings in 36.80s ==================

openhands-ai · 2026-01-07T17:15:37Z

I'm on it! xingyaoww can track my progress at all-hands.dev

xingyaoww · 2026-01-07T17:16:25Z

@OpenHands help me open a new PR in OpenHands/OpenHands-CLI that will use this branch of SDK, and use the summary from the action to replace the current summary of action generated by string truncation heuristic

openhands-ai · 2026-01-07T17:16:35Z

I'm on it! xingyaoww can track my progress at all-hands.dev

openhands-ai · 2026-01-07T17:25:25Z

Summary

I've successfully created a new PR in OpenHands/OpenHands-CLI that uses the action summary feature from the SDK's feature/action-summary branch.

PR Created: OpenHands/OpenHands-CLI#303

Changes Made

Updated SDK dependency (pyproject.toml):
- Configured openhands-sdk and openhands-tools to use the feature/action-summary branch from OpenHands/software-agent-sdk (commit 19927a1)
Modified _extract_meaningful_title() method (openhands_cli/tui/widgets/richlog_visualizer.py):
- Added a check at the beginning of the method to prefer the LLM-generated summary field when available on events
- Falls back to the existing string truncation heuristic when summary is not present
- The summary is truncated to 70 characters if needed

Checklist

✅ Uses the feature/action-summary branch of the SDK
✅ Uses ActionEvent.summary instead of string truncation heuristic for action titles
✅ Falls back gracefully when summary is not available
✅ All 829 tests pass
✅ Pre-commit hooks pass (ruff format, ruff lint, pycodestyle, pyright)
✅ Changes are minimal and focused

Note

This PR is created as a draft since it depends on OpenHands/software-agent-sdk PR #1339 being merged first. Once that PR is merged, the SDK dependency should be updated to use the released version.

View full conversation

openhands-ai · 2026-01-07T17:26:47Z

Summary

The test test_security_risk_field_with_live_server is already passing on the current branch. I ran the test multiple times and it consistently passes:

tests/cross/test_remote_conversation_live_server.py::test_security_risk_field_with_live_server PASSED

The test failure you reported appears to have been a transient issue that has since been resolved. The test validates that:

Actions execute successfully with security_risk provided
Actions execute successfully without security_risk (defaults to UNKNOWN)
ActionEvent always has a security_risk attribute

I also ran the full SDK and cross test suites (1706 tests) and all tests pass.

No code changes were needed - the branch feature/action-summary is already in a working state with all tests passing.

View full conversation

xingyaoww · 2026-01-07T17:28:29Z

@OpenHands please fix the failing actions on PR #1339 at branch feature/action-summary

openhands-ai · 2026-01-07T17:28:40Z

I'm on it! xingyaoww can track my progress at all-hands.dev

malhotra5 · 2026-01-07T17:43:36Z

openhands-sdk/openhands/sdk/tool/tool.py

 ActionT = TypeVar("ActionT", bound=Action)
 ObservationT = TypeVar("ObservationT", bound=Observation)
 _action_types_with_risk: dict[type, type] = {}
+_action_types_with_summary: dict[type, type] = {}


nice with the caching!

openhands-sdk/openhands/sdk/agent/agent.py

malhotra5

LGTM!

The MCP tests were timing out in CI because uvx may need to download packages on first run. Increased timeout from default 30s to 120s. Co-authored-by: openhands <[email protected]>

github-actions · 2026-01-07T17:57:27Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

openhands-ai · 2026-01-07T18:02:41Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1339 at branch `feature/action-summary`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

github-actions · 2026-01-07T18:07:27Z

🧪 Integration Tests Results

Overall Success Rate: 98.0%
Total Cost: $2.10
Models Tested: 6
Timestamp: 2026-01-07 18:07:20 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs

📊 Summary

Model	Overall	Integration (Required)	Behavior (Optional)	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_deepseek_deepseek_chat	100.0%	100.0%	N/A	8/8	1	9	$0.07	691,007
litellm_proxy_mistral_devstral_2512	87.5%	87.5%	N/A	7/8	1	9	$0.19	460,315
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	100.0%	N/A	9/9	0	9	$0.65	495,177
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	100.0%	N/A	9/9	0	9	$0.64	418,357
litellm_proxy_gpt_5.1_codex_max	100.0%	100.0%	N/A	8/8	1	9	$0.26	278,531
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	100.0%	N/A	8/8	1	9	$0.29	438,950

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.07
Token Usage: prompt: 676,231, completion: 14,776, cache_read: 623,488
Run Suffix: litellm_proxy_deepseek_deepseek_chat_225c0eb_deepseek_run_N9_20260107_175757
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_mistral_devstral_2512

Overall Success Rate: 87.5% (7/8)
Integration Tests (Required): 87.5% (7/9)
Total Cost: $0.19
Token Usage: prompt: 455,337, completion: 4,978
Run Suffix: litellm_proxy_mistral_devstral_2512_225c0eb_devstral_2512_run_N9_20260107_180133
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.009)

litellm_proxy_claude_sonnet_4_5_20250929

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.65
Token Usage: prompt: 482,104, completion: 13,073, cache_read: 391,130, cache_write: 90,154, reasoning: 3,439
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_225c0eb_sonnet_run_N9_20260107_180138

litellm_proxy_vertex_ai_gemini_3_pro_preview

Overall Success Rate: 100.0% (9/9)
Integration Tests (Required): 100.0% (9/9)
Total Cost: $0.64
Token Usage: prompt: 396,338, completion: 22,019, cache_read: 233,224, reasoning: 17,103
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_225c0eb_gemini_3_pro_run_N9_20260107_175813

litellm_proxy_gpt_5.1_codex_max

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.26
Token Usage: prompt: 271,457, completion: 7,074, cache_read: 133,888, reasoning: 4,352
Run Suffix: litellm_proxy_gpt_5.1_codex_max_225c0eb_gpt51_codex_run_N9_20260107_175903
Skipped Tests: 1

Skipped Tests:

t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

litellm_proxy_moonshot_kimi_k2_thinking

Overall Success Rate: 100.0% (8/8)
Integration Tests (Required): 100.0% (8/9)
Total Cost: $0.29
Token Usage: prompt: 427,021, completion: 11,929, cache_read: 361,216
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_225c0eb_kimi_k2_run_N9_20260107_175750
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

openhands-agent added 3 commits December 5, 2025 21:05

Remove set_enable_action_summaries method

0f94d49

Simplify configuration by removing the dynamic setter method. Action summaries are now configured only at initialization time via the enable_action_summaries parameter. Co-authored-by: openhands <[email protected]>

xingyaoww commented Dec 5, 2025

View reviewed changes

openhands-sdk/openhands/sdk/event/llm_convertible/action.py Outdated Show resolved Hide resolved

enyst reviewed Dec 5, 2025

View reviewed changes

openhands-sdk/openhands/sdk/agent/agent.py Outdated Show resolved Hide resolved

xingyaoww commented Dec 5, 2025

View reviewed changes

openhands-sdk/openhands/sdk/agent/agent.py Outdated Show resolved Hide resolved

openhands-agent added 3 commits December 5, 2025 22:53

Remove redundant assertion for summary field

680386f

The assertion checking if 'summary' was removed from arguments is redundant because _extract_summary always uses pop() to remove it. Co-authored-by: openhands <[email protected]>

Merge main into feature/action-summary

295fd35

Resolved conflicts in: - openhands-sdk/openhands/sdk/agent/utils.py - openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py - openhands-sdk/openhands/sdk/tool/tool.py Co-authored-by: openhands <[email protected]>

openhands-agent and others added 3 commits January 5, 2026 16:14

Merge branch 'main' into feature/action-summary

8260b82

revert unrelated changes

8e8575f

xingyaoww requested a review from enyst January 7, 2026 17:15

Merge branch 'main' into feature/action-summary

19927a1

xingyaoww marked this pull request as ready for review January 7, 2026 17:15

xingyaoww requested review from malhotra5 and neubig January 7, 2026 17:15

xingyaoww mentioned this pull request Jan 7, 2026

Use LLM-generated action summary instead of string truncation heuristic + UI Clean up OpenHands/OpenHands-CLI#303

Merged

malhotra5 reviewed Jan 7, 2026

View reviewed changes

openhands-sdk/openhands/sdk/agent/agent.py Show resolved Hide resolved

malhotra5 approved these changes Jan 7, 2026

View reviewed changes

Increase MCP test timeout to 120s for CI environments

225c0eb

The MCP tests were timing out in CI because uvx may need to download packages on first run. Increased timeout from default 30s to 120s. Co-authored-by: openhands <[email protected]>

xingyaoww added the integration-test Runs the integration tests and comments the results label Jan 7, 2026

xingyaoww merged commit 1220936 into main Jan 7, 2026
46 of 47 checks passed

xingyaoww deleted the feature/action-summary branch January 7, 2026 18:18

Add action summary feature for tool calls #1339

Add action summary feature for tool calls #1339

Uh oh!

Conversation

xingyaoww commented Dec 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Core Implementation

Testing

Usage

Example Output

Design Decisions

Comparison with Security Risk Feature

Files Changed

Testing Checklist

Future Enhancements

Uh oh!

Uh oh!

xingyaoww commented Dec 5, 2025

Uh oh!

openhands-ai bot commented Dec 5, 2025

Uh oh!

Uh oh!

openhands-ai bot commented Dec 5, 2025

Summary

Analysis & Decision

Changes Made

Result

Uh oh!

github-actions bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

blacksmith-sh bot commented Dec 21, 2025

Uh oh!

blacksmith-sh bot commented Dec 21, 2025

Uh oh!

all-hands-bot commented Jan 5, 2026

Uh oh!

xingyaoww commented Jan 5, 2026

Uh oh!

openhands-ai bot commented Jan 5, 2026

Uh oh!

openhands-ai bot commented Jan 5, 2026

Summary

Merge Conflicts Resolved

PR Changes Summary

Verification

Uh oh!

xingyaoww commented Jan 7, 2026

Uh oh!

openhands-ai bot commented Jan 7, 2026

Uh oh!

openhands-ai bot commented Jan 7, 2026

Summary

Changes Made

Result

Uh oh!

xingyaoww commented Jan 7, 2026

Uh oh!

openhands-ai bot commented Jan 7, 2026

Uh oh!

xingyaoww commented Jan 7, 2026

Uh oh!

openhands-ai bot commented Jan 7, 2026

Uh oh!

openhands-ai bot commented Jan 7, 2026

Summary

Changes Made

Checklist

Note

Uh oh!

openhands-ai bot commented Jan 7, 2026

Summary

Uh oh!

xingyaoww commented Jan 7, 2026

Uh oh!

openhands-ai bot commented Jan 7, 2026

xingyaoww commented Dec 5, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Dec 5, 2025 •

edited

Loading