Skip to content

test: use default agent preset for integration tests#1996

Open
xingyaoww wants to merge 5 commits intomainfrom
openhands/use-default-preset-for-integration-tests
Open

test: use default agent preset for integration tests#1996
xingyaoww wants to merge 5 commits intomainfrom
openhands/use-default-preset-for-integration-tests

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Feb 11, 2026

Summary

Update integration tests to use the default agent preset from openhands.tools.preset.default, ensuring tests validate the same agent configuration shipped to production (GUI/CLI).

Changes:

  • tests/integration/base.py: Updated BaseIntegrationTest to:

    • Import and use get_default_tools() from the preset
    • Add enable_browser property (default False) for tests to override
    • Make tools property non-abstract with default implementation using preset
  • Integration tests (t01-t08): Removed manual tool registration, now using default preset:

    • t01_fix_simple_typo.py
    • t02_add_bash_hello.py
    • t03_jupyter_write_file.py
    • t04_git_staging.py
    • t05_simple_browsing.py (with enable_browser=True)
    • t06_github_pr_browsing.py (with enable_browser=True)
    • t07_interactive_commands.py
    • t08_image_file_viewing.py
  • Behavior tests: Updated default_behavior_tools() in behavior_helpers.py to use preset

  • b05_do_not_create_redundant_files.py: Removed manual tool registration

Net change: -85 lines of code

This ensures integration tests validate the production agent configuration, catching any issues before they reach users.

Fixes #372

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d459f4e-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d459f4e-python \
  ghcr.io/openhands/agent-server:d459f4e-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d459f4e-golang-amd64
ghcr.io/openhands/agent-server:d459f4e-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d459f4e-golang-arm64
ghcr.io/openhands/agent-server:d459f4e-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d459f4e-java-amd64
ghcr.io/openhands/agent-server:d459f4e-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d459f4e-java-arm64
ghcr.io/openhands/agent-server:d459f4e-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d459f4e-python-amd64
ghcr.io/openhands/agent-server:d459f4e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:d459f4e-python-arm64
ghcr.io/openhands/agent-server:d459f4e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:d459f4e-golang
ghcr.io/openhands/agent-server:d459f4e-java
ghcr.io/openhands/agent-server:d459f4e-python

About Multi-Architecture Support

  • Each variant tag (e.g., d459f4e-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d459f4e-python-amd64) are also available if needed

Update integration tests to use the default agent preset from
openhands.tools.preset.default, ensuring tests validate the same
agent configuration shipped to production (GUI/CLI).

Changes:
- Update BaseIntegrationTest to use get_default_tools() by default
- Add enable_browser property (default False) for tests to override
- Remove manual tool registration from t01-t08 tests
- Update behavior test helpers to use default preset
- Add enable_browser=True for browsing tests (t05, t06)

This ensures integration tests validate the production agent
configuration, catching any issues before they reach users.

Fixes #372

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Good Refactoring with Minor Design Smell

Verdict: ✅ Worth merging - Solid refactoring that removes duplication and aligns tests with production config.

Key Insight: The -85 line reduction and single source of truth for tool configuration is excellent. The enable_browser flag works but represents a scaling problem for future tool configurations.

See inline comments for specific feedback.

@openhands-ai
Copy link

openhands-ai bot commented Feb 11, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • PR Review by OpenHands

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1996 at branch `openhands/use-default-preset-for-integration-tests`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you'd like to negotiate with the reviewer agent, LGTM 😅

@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Feb 11, 2026
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@xingyaoww xingyaoww added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Feb 13, 2026
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@enyst enyst added behavior-initiative This is related to the system prompt sections and LLM steering. and removed behavior-initiative This is related to the system prompt sections and LLM steering. labels Feb 14, 2026
@xingyaoww
Copy link
Collaborator Author

@OpenHands please check the github workflow in this PR https://github.com/OpenHands/software-agent-sdk/actions/runs/21925641027/job/63317348622

Tell me why the integration test in this PR never finish running?

@openhands-ai
Copy link

openhands-ai bot commented Feb 15, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 15, 2026

Analysis: Why the Integration Test Never Finishes

After analyzing the GitHub workflow logs and the PR changes, I've identified the root cause of why the integration tests are hanging for 6+ hours before being cancelled.

Root Cause

The PR changes t05_simple_browsing.py and t06_github_pr_browsing.py to use browser tools by setting enable_browser=True, which includes the full BrowserToolSet (14 browser-related tools).

The problem is:

  1. Original design: These tests were designed to work with terminal tools only (curl/wget). The test instruction "Browse localhost:8000, and tell me the ultimate answer to life" was meant to be completed using terminal commands.

  2. PR change: By setting enable_browser=True, the tests now include browser tools. The agent attempts to use actual browser automation (Chromium via Playwright) to complete the task.

  3. Why it hangs:

    • Each browser action has a 300-second (5-minute) timeout in BrowserToolExecutor.__call__
    • The test has max_iteration_per_run=100 iterations
    • Worst case: 100 iterations × 300 seconds = 8.3 hours per test
    • The job was cancelled after ~6 hours, which aligns with this calculation
  4. Additional factors:

    • Browser operations are inherently slower than terminal commands
    • The agent may struggle to complete the task efficiently with browser tools
    • The test wasn't designed for browser-based interaction (the HTML page requires clicking a button to reveal the answer)

Evidence from Logs

  • Job started: 2026-02-11T22:31:34Z
  • Job cancelled: 2026-02-12T04:36:23Z (~6 hours later)
  • Step "Run integration test evaluation" was stuck in in_progress state
  • No output logs were captured for the test execution step

Recommended Fix

The tests t05_simple_browsing.py and t06_github_pr_browsing.py should NOT set enable_browser=True. They should use the default enable_browser=False and rely on terminal tools as originally designed.

Alternatively, if browser testing is desired:

  1. Create separate browser-specific tests with appropriate timeouts
  2. Reduce max_iteration_per_run for browser tests
  3. Add per-test timeouts to prevent runaway execution

View full conversation

@xingyaoww xingyaoww removed the integration-test Runs the integration tests and comments the results label Feb 15, 2026
@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Feb 15, 2026 — with OpenHands AI
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww removed the integration-test Runs the integration tests and comments the results label Feb 15, 2026
@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Feb 15, 2026 — with OpenHands AI
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test: use default agent preset for integration tests

4 participants