test: use default agent preset for integration tests by xingyaoww · Pull Request #1996 · OpenHands/software-agent-sdk

xingyaoww · 2026-02-11T09:23:25Z

Summary

Update integration tests to use the default agent preset from openhands.tools.preset.default, ensuring tests validate the same agent configuration shipped to production (GUI/CLI).

Changes:

tests/integration/base.py: Updated BaseIntegrationTest to:
- Import and use get_default_tools() from the preset
- Add enable_browser property (default False) for tests to override
- Make tools property non-abstract with default implementation using preset
Integration tests (t01-t08): Removed manual tool registration, now using default preset:
- t01_fix_simple_typo.py
- t02_add_bash_hello.py
- t03_jupyter_write_file.py
- t04_git_staging.py
- t05_simple_browsing.py (with enable_browser=True)
- t06_github_pr_browsing.py (with enable_browser=True)
- t07_interactive_commands.py
- t08_image_file_viewing.py
Behavior tests: Updated default_behavior_tools() in behavior_helpers.py to use preset
b05_do_not_create_redundant_files.py: Removed manual tool registration

Net change: -85 lines of code

This ensures integration tests validate the production agent configuration, catching any issues before they reach users.

Fixes #372

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

@xingyaoww can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d459f4e-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d459f4e-python \
  ghcr.io/openhands/agent-server:d459f4e-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d459f4e-golang-amd64
ghcr.io/openhands/agent-server:d459f4e-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d459f4e-golang-arm64
ghcr.io/openhands/agent-server:d459f4e-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d459f4e-java-amd64
ghcr.io/openhands/agent-server:d459f4e-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d459f4e-java-arm64
ghcr.io/openhands/agent-server:d459f4e-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d459f4e-python-amd64
ghcr.io/openhands/agent-server:d459f4e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:d459f4e-python-arm64
ghcr.io/openhands/agent-server:d459f4e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:d459f4e-golang
ghcr.io/openhands/agent-server:d459f4e-java
ghcr.io/openhands/agent-server:d459f4e-python

About Multi-Architecture Support

Each variant tag (e.g., d459f4e-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., d459f4e-python-amd64) are also available if needed

Update integration tests to use the default agent preset from openhands.tools.preset.default, ensuring tests validate the same agent configuration shipped to production (GUI/CLI). Changes: - Update BaseIntegrationTest to use get_default_tools() by default - Add enable_browser property (default False) for tests to override - Remove manual tool registration from t01-t08 tests - Update behavior test helpers to use default preset - Add enable_browser=True for browsing tests (t05, t06) This ensures integration tests validate the production agent configuration, catching any issues before they reach users. Fixes #372 Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟡 Acceptable - Good Refactoring with Minor Design Smell

Verdict: ✅ Worth merging - Solid refactoring that removes duplication and aligns tests with production config.

Key Insight: The -85 line reduction and single source of truth for tool configuration is excellent. The enable_browser flag works but represents a scaling problem for future tool configurations.

See inline comments for specific feedback.

tests/integration/base.py

tests/integration/tests/t05_simple_browsing.py

tests/integration/utils/behavior_helpers.py

openhands-ai · 2026-02-11T09:33:18Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- PR Review by OpenHands

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1996 at branch `openhands/use-default-preset-for-integration-tests`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

…-tests

enyst

Unless you'd like to negotiate with the reviewer agent, LGTM 😅

github-actions · 2026-02-11T22:31:11Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

…-tests

github-actions · 2026-02-13T10:57:48Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

xingyaoww · 2026-02-15T09:39:57Z

@OpenHands please check the github workflow in this PR https://github.com/OpenHands/software-agent-sdk/actions/runs/21925641027/job/63317348622

Tell me why the integration test in this PR never finish running?

openhands-ai · 2026-02-15T09:40:05Z

I'm on it! xingyaoww can track my progress at all-hands.dev

openhands-ai · 2026-02-15T09:45:24Z

Analysis: Why the Integration Test Never Finishes

After analyzing the GitHub workflow logs and the PR changes, I've identified the root cause of why the integration tests are hanging for 6+ hours before being cancelled.

Root Cause

The PR changes t05_simple_browsing.py and t06_github_pr_browsing.py to use browser tools by setting enable_browser=True, which includes the full BrowserToolSet (14 browser-related tools).

The problem is:

Original design: These tests were designed to work with terminal tools only (curl/wget). The test instruction "Browse localhost:8000, and tell me the ultimate answer to life" was meant to be completed using terminal commands.
PR change: By setting enable_browser=True, the tests now include browser tools. The agent attempts to use actual browser automation (Chromium via Playwright) to complete the task.
Why it hangs:
- Each browser action has a 300-second (5-minute) timeout in BrowserToolExecutor.__call__
- The test has max_iteration_per_run=100 iterations
- Worst case: 100 iterations × 300 seconds = 8.3 hours per test
- The job was cancelled after ~6 hours, which aligns with this calculation
Additional factors:
- Browser operations are inherently slower than terminal commands
- The agent may struggle to complete the task efficiently with browser tools
- The test wasn't designed for browser-based interaction (the HTML page requires clicking a button to reveal the answer)

Evidence from Logs

Job started: 2026-02-11T22:31:34Z
Job cancelled: 2026-02-12T04:36:23Z (~6 hours later)
Step "Run integration test evaluation" was stuck in in_progress state
No output logs were captured for the test execution step

Recommended Fix

The tests t05_simple_browsing.py and t06_github_pr_browsing.py should NOT set enable_browser=True. They should use the default enable_browser=False and rely on terminal tools as originally designed.

Alternatively, if browser testing is desired:

Create separate browser-specific tests with appropriate timeouts
Reduce max_iteration_per_run for browser tests
Add per-test timeouts to prevent runaway execution

View full conversation

github-actions · 2026-02-15T09:50:31Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-02-15T11:46:54Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

openhands-ai bot mentioned this pull request Feb 11, 2026

Test: use default agent preset for integration tests #372

Open

all-hands-bot reviewed Feb 11, 2026

View reviewed changes

tests/integration/base.py Show resolved Hide resolved

tests/integration/base.py Show resolved Hide resolved

tests/integration/tests/t05_simple_browsing.py Show resolved Hide resolved

tests/integration/utils/behavior_helpers.py Show resolved Hide resolved

xingyaoww and others added 2 commits February 11, 2026 16:21

Merge branch 'main' into openhands/use-default-preset-for-integration…

eff042e

…-tests

Merge branch 'main' into openhands/use-default-preset-for-integration…

0c1c46a

…-tests

enyst approved these changes Feb 11, 2026

View reviewed changes

xingyaoww added the integration-test Runs the integration tests and comments the results label Feb 11, 2026

Merge branch 'main' into openhands/use-default-preset-for-integration…

fdf88a0

…-tests

xingyaoww added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Feb 13, 2026

enyst added behavior-initiative This is related to the system prompt sections and LLM steering. and removed behavior-initiative This is related to the system prompt sections and LLM steering. labels Feb 14, 2026

xingyaoww removed the integration-test Runs the integration tests and comments the results label Feb 15, 2026

xingyaoww added the integration-test Runs the integration tests and comments the results label Feb 15, 2026 — with OpenHands AI

test: temporarily disable non-browser tests to debug browser test hang

c9ef6c9

Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww removed the integration-test Runs the integration tests and comments the results label Feb 15, 2026

xingyaoww added the integration-test Runs the integration tests and comments the results label Feb 15, 2026 — with OpenHands AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: use default agent preset for integration tests#1996

test: use default agent preset for integration tests#1996
xingyaoww wants to merge 5 commits intomainfrom
openhands/use-default-preset-for-integration-tests

xingyaoww commented Feb 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openhands-ai bot commented Feb 11, 2026

Uh oh!

enyst left a comment

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

xingyaoww commented Feb 15, 2026

Uh oh!

openhands-ai bot commented Feb 15, 2026

Uh oh!

openhands-ai bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xingyaoww commented Feb 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes:

Net change: -85 lines of code

Checklist

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 Acceptable - Good Refactoring with Minor Design Smell

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openhands-ai bot commented Feb 11, 2026

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

xingyaoww commented Feb 15, 2026

Uh oh!

openhands-ai bot commented Feb 15, 2026

Uh oh!

openhands-ai bot commented Feb 15, 2026

Analysis: Why the Integration Test Never Finishes

Root Cause

Evidence from Logs

Recommended Fix

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xingyaoww commented Feb 11, 2026 •

edited by github-actions bot

Loading