test: use default agent preset for integration tests#1996
test: use default agent preset for integration tests#1996
Conversation
Update integration tests to use the default agent preset from openhands.tools.preset.default, ensuring tests validate the same agent configuration shipped to production (GUI/CLI). Changes: - Update BaseIntegrationTest to use get_default_tools() by default - Add enable_browser property (default False) for tests to override - Remove manual tool registration from t01-t08 tests - Update behavior test helpers to use default preset - Add enable_browser=True for browsing tests (t05, t06) This ensures integration tests validate the production agent configuration, catching any issues before they reach users. Fixes #372 Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Good Refactoring with Minor Design Smell
Verdict: ✅ Worth merging - Solid refactoring that removes duplication and aligns tests with production config.
Key Insight: The -85 line reduction and single source of truth for tool configuration is excellent. The enable_browser flag works but represents a scaling problem for future tool configurations.
See inline comments for specific feedback.
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
enyst
left a comment
There was a problem hiding this comment.
Unless you'd like to negotiate with the reviewer agent, LGTM 😅
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
@OpenHands please check the github workflow in this PR https://github.com/OpenHands/software-agent-sdk/actions/runs/21925641027/job/63317348622 Tell me why the integration test in this PR never finish running? |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Analysis: Why the Integration Test Never FinishesAfter analyzing the GitHub workflow logs and the PR changes, I've identified the root cause of why the integration tests are hanging for 6+ hours before being cancelled. Root CauseThe PR changes The problem is:
Evidence from Logs
Recommended FixThe tests Alternatively, if browser testing is desired:
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
Summary
Update integration tests to use the default agent preset from
openhands.tools.preset.default, ensuring tests validate the same agent configuration shipped to production (GUI/CLI).Changes:
tests/integration/base.py: UpdatedBaseIntegrationTestto:get_default_tools()from the presetenable_browserproperty (defaultFalse) for tests to overridetoolsproperty non-abstract with default implementation using presetIntegration tests (t01-t08): Removed manual tool registration, now using default preset:
t01_fix_simple_typo.pyt02_add_bash_hello.pyt03_jupyter_write_file.pyt04_git_staging.pyt05_simple_browsing.py(withenable_browser=True)t06_github_pr_browsing.py(withenable_browser=True)t07_interactive_commands.pyt08_image_file_viewing.pyBehavior tests: Updated
default_behavior_tools()inbehavior_helpers.pyto use presetb05_do_not_create_redundant_files.py: Removed manual tool registrationNet change: -85 lines of code
This ensures integration tests validate the production agent configuration, catching any issues before they reach users.
Fixes #372
Checklist
@xingyaoww can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:d459f4e-pythonRun
All tags pushed for this build
About Multi-Architecture Support
d459f4e-python) is a multi-arch manifest supporting both amd64 and arm64d459f4e-python-amd64) are also available if needed