Skip to content

Splits Tests More Evenly, Only Runs Relevant Ones, Adds Reporting#5059

Merged
myurasov-nv merged 107 commits intoisaac-sim:developfrom
myurasov-nv:my-gated-tests-1
Mar 24, 2026
Merged

Splits Tests More Evenly, Only Runs Relevant Ones, Adds Reporting#5059
myurasov-nv merged 107 commits intoisaac-sim:developfrom
myurasov-nv:my-gated-tests-1

Conversation

@myurasov-nv
Copy link
Copy Markdown
Collaborator

@myurasov-nv myurasov-nv commented Mar 18, 2026

Description

Until now every PR ran all GPU test jobs regardless of what changed. That is no-good and needs to change. :)

This PR:

  • Splits tests more evenly and according to functionality
  • Only runs relevant ones; manually all can still be executed
  • Adds reporting of the passed/failed tests on the Workflow page
  • Improves job cancellation so resources are freed faster

Detailed changes

  • Added a detect-changes job that runs git diff --name-only to produce per-package true/false outputs; each test job skips itself when its package is untouched. Core/infra changes still trigger everything. Manual "run all" option available via GH UI.
  • Split test-general into per-package jobs: test-isaaclab (2 shards), test-isaaclab-rl, -mimic, -contrib, -teleop, -visualizers, -assets. Each job is independently gated and sharded.
  • Replaced hardcoded include-files whitelists with deterministic shard-index % shard-count sharding, fixing 7 silently skipped isaaclab_tasks tests.
  • Broke out cuRobo Planner and SkillGen into separate test jobs with their own conditional execution gates.
  • Fixed conftest.py exclude pattern that was matching a comma-separated string as one substring (never matched).
  • Container is now killed on cancellation signals so jobs stop quickly via GH UI.
  • JUnit results are converted to Markdown and displayed on the GH Actions job page, including skip reasons.
  • Faster code checkout and updated actions/checkout@v4, actions/upload-artifact@v4, actions/download-artifact@v4 to latest SHA-pinned versions.
  • Extracted reusable composite action for repeating Docker test-run boilerplate.
  • Added Docker image pull timing for diagnostics.
  • Suppressed Node.js deprecation warnings in CI output.

How test_train_environments.py is split

The test function test_train_environments is parametrized by task_spec (all registered Isaac environments) and workflow (rl_games, rsl_rl, sb3, skrl), so pytest expands it into N_tasks x 4_workflows individual test items. The pytest_collection_modifyitems hook in conftest.py runs after pytest builds this full list and keeps only every Nth item based on TEST_SHARD_INDEX / TEST_SHARD_COUNT env vars (e.g. shard 0/3 gets items 0, 3, 6, ...). Each CI runner sees ~1/3 of the parametrized combinations, cutting wall time from ~1.2h to ~24min per shard.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@myurasov-nv myurasov-nv changed the title Splits Tests in More Even Portions, Only Runs Relevant Ones [POC] Splits Tests in More Even Portions, Only Runs Relevant Ones Mar 18, 2026
@myurasov-nv myurasov-nv changed the title [POC] Splits Tests in More Even Portions, Only Runs Relevant Ones [POC] Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation Mar 18, 2026
@myurasov-nv myurasov-nv changed the title [POC] Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting Mar 19, 2026
@myurasov-nv myurasov-nv marked this pull request as ready for review March 19, 2026 21:42
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR substantially overhauls the CI pipeline for Isaac Lab: it introduces a lightweight detect-changes job that gates every GPU test job on per-package file-diff outputs, splits the monolithic test-general job into 10+ independently-triggered per-package jobs (several sharded 3-way), extracts a reusable run-package-tests composite action, adds $GITHUB_STEP_SUMMARY reporting via a new junit_summary.py script, and fixes a long-standing conftest bug that caused isaaclab_tasks shards to silently skip tests.

Key changes and observations:

  • detect-changes job: Uses git diff --name-only FETCH_HEAD HEAD with fetch-depth: 1; correctly falls back to running all tests when the diff is empty or the event is not a PR. The has() helper function is correctly written — set -e is not inherited inside command substitutions by default in bash, so grep returning 1 (no match) is handled safely by the local rc=$? capture.
  • 3-shard parallelism for isaaclab_tasks and isaaclab (core) cuts wall time from ~1.2 h to ~24 min per shard. Sharding is deterministic (sorted file list, modulo), and the benchmarking conftest.py applies complementary test-item-level sharding for parametrized tests like test_train_environments.py.
  • run-package-tests composite action eliminates ~50 lines of duplicated boilerplate per job. The action no longer uploads raw JUnit XML files as GitHub Actions artifacts — results are surfaced only via dorny/test-reporter annotations and step summaries. This is a deliberate design choice but removes the ability to download raw XMLs post-run.
  • junit_summary.py: Correctly uses f-strings throughout. When all tests pass, there is no top-level status banner in the Step Summary — only a collapsed <details> section, which is a minor UX gap.
  • tools/conftest.py fix: the old exclude-pattern treated "isaaclab_tasks,isaaclab_newton,isaaclab_physx" as a single substring (never matched); the new comma-split logic works correctly. File-level sharding is correctly bypassed when include-files is set, delegating to test-item-level sharding in benchmarking/conftest.py.

Confidence Score: 4/5

  • PR is safe to merge; the new CI architecture is sound and the changes are non-breaking infrastructure improvements.
  • The core logic — change detection, sharding, composite action extraction, and JUnit reporting — is all correct. The three comments left are P2 style/UX observations (missing all-pass header, no artifact upload, and a defensive suggestion for the has() error branch) that do not block functionality. The previously raised concerns about SHA-pinning and fetch-depth have been acknowledged upstream. No runtime bugs were found in the Python, shell, or YAML code.
  • .github/workflows/build.yaml — the detect-changes has() helper and the overall job-gating logic are the most novel and consequential code; worth a final read-through before merge.

Important Files Changed

Filename Overview
.github/workflows/build.yaml Major restructuring: adds detect-changes job for per-package gating, splits test-general into 10+ per-package jobs (each independently triggered), adds 3-shard parallelism for isaaclab_tasks and isaaclab (core), and removes the old combine-results pattern in favour of dorny/test-reporter. The has() helper function may silently misbehave if inherit_errexit is active on the runner's bash.
.github/actions/run-package-tests/action.yml New composite action that wraps ECR pull, run-tests, dorny/test-reporter annotation, and fork-PR failure gate into a single reusable step. Removes per-job artifact uploads (test XMLs are no longer downloadable post-run). Adds pull-duration diagnostics and cancellation cleanup.
.github/actions/run-tests/action.yml Adds shard-index/shard-count inputs and env var forwarding, removes deprecated cuda-issue-only input, fixes the "not " prefix check to avoid spurious matches on words starting with "not", adds foreground Docker run with --init --stop-timeout 5 for faster signal handling, and improves fallback XML combination logic.
.github/actions/run-tests/junit_summary.py New script that parses a JUnit XML and writes a Markdown summary to $GITHUB_STEP_SUMMARY. Correctly uses f-strings; handles failures, errors, skips, and passes with collapsible sections. Minor: when all tests pass there is no top-level status banner, only a collapsed green <details> block.
tools/conftest.py Adds file-level sharding via TEST_SHARD_INDEX/TEST_SHARD_COUNT env vars; sharding is correctly bypassed when include_files is set (deferring to benchmarking conftest's test-item sharding). Also fixes the exclude-pattern split that previously matched a comma-separated string as a single substring.
source/isaaclab_tasks/test/benchmarking/conftest.py Adds pytest_collection_modifyitems hook for test-item-level sharding of parametrized test cases (used by test_train_environments.py). Correctly reads TEST_SHARD_INDEX/TEST_SHARD_COUNT and filters items by modulo. Complements the file-level sharding in tools/conftest.py.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Push / PR / workflow_dispatch] --> DC[detect-changes\nubuntu-latest]

    DC -->|always| BUILD[build\nBuild Base Docker Image]
    DC -->|mimic/tasks/core/infra| BCUROBO[build-curobo\nBuild cuRobo Image]

    BUILD --> T1[test-isaaclab-tasks 1/3\ntasks/core/infra]
    BUILD --> T2[test-isaaclab-tasks 2/3\ntasks/core/infra]
    BUILD --> T3[test-isaaclab-tasks 3/3\ntasks/core/infra]
    BUILD --> C1[test-isaaclab-core 1/3\ncore/infra]
    BUILD --> C2[test-isaaclab-core 2/3\ncore/infra]
    BUILD --> C3[test-isaaclab-core 3/3\ncore/infra]
    BUILD --> RL[test-isaaclab-rl\nrl/core/infra]
    BUILD --> MIM[test-isaaclab-mimic\nmimic/core/infra]
    BUILD --> CON[test-isaaclab-contrib\ncontrib/core/infra]
    BUILD --> TEL[test-isaaclab-teleop\nteleop/core/infra]
    BUILD --> VIS[test-isaaclab-visualizers\nvisualizers/core/infra]
    BUILD --> ASS[test-isaaclab-assets\nassets/core/infra]
    BUILD --> NEW[test-isaaclab-newton\nnewton/core/infra]
    BUILD --> PHY[test-isaaclab-physx\nphysx/core/infra]
    BUILD --> TET[test-environments-training\ntasks/core/infra]
    BCUROBO --> CUR[test-curobo\nmimic/core/infra]
    BCUROBO --> SKL[test-skillgen\nmimic/tasks/core/infra]

    subgraph "Per-job reporting (each job)"
        RPT[run-package-tests composite\n→ ECR pull\n→ pytest in Docker\n→ dorny/test-reporter\n→ GITHUB_STEP_SUMMARY via junit_summary.py\n→ Fork PR failure gate]
    end

    T1 & T2 & T3 & C1 & C2 & C3 & RL & MIM & CON & TEL & VIS & ASS & NEW & PHY & TET & CUR & SKL --> RPT
Loading

Reviews (9): Last reviewed commit: "Typo in change detection - collect apps/..." | Re-trigger Greptile

@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Mar 19, 2026
@myurasov-nv
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

@myurasov-nv
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

2 similar comments
@myurasov-nv
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

@myurasov-nv
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 20, 2026
@myurasov-nv myurasov-nv requested a review from ooctipus March 20, 2026 03:50
@AntoineRichard
Copy link
Copy Markdown
Collaborator

@greptile

@myurasov-nv myurasov-nv changed the title Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting Splits Tests More Evenly, Only Runs Relevant Ones, Adds Reporting Mar 21, 2026
@myurasov-nv
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

@myurasov-nv
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

@@ -1,103 +0,0 @@
# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are removing this job, are the test jobs going to show fail? we previously set them up to continue on failure so that this combine-results job executes.

also, where are the final reports being presented with this removed?

Copy link
Copy Markdown
Collaborator

@ooctipus ooctipus Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems pretty clear to me to scroll and see the pass or failure on each tests, as I am reviewing, I am also trying to understand what is the benefit of /combine-results/action.yml beside it creates one more entry of pass and fail? It seems removable for me unless I am missing some important consideration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was supposed to generate a full report with the exact test cases that failed and their logs. it never really worked too well though, but if we were able to get that working, then it'll save us from going into each job, scrolling to the bottom to see which tests failed and why.

but I'm more concerned about capturing the fail/pass correctly, since before we depended on the combine-results job to check the overall pass/fail

Copy link
Copy Markdown
Collaborator Author

@myurasov-nv myurasov-nv Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing jobs are displaying results individually, so that's not really needed for capturing the results. If fact it succeeds always regardless of testing jobs. It was indeed combining JUnit XML files into one. Do we need that report? My suggestion is to bring it back once we do need that report. :)

Copy link
Copy Markdown
Collaborator

@ooctipus ooctipus Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what kelly is suggesting is that we can go through each individual test to check the JUnit test result details when more inspection is needed, but if we have one summary report that collects all failure, it could be convenient to reference directly(think about case where we just need to copy one table and paste to github,chat,bugs and ask someone to fix rather than manually going through each tests and collect). But I do agree if you feel like this needs another PR to support, we don't have to have it in right now.

@myurasov-nv
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

@myurasov-nv myurasov-nv requested a review from kellyguo11 March 23, 2026 21:19
@myurasov-nv myurasov-nv merged commit b0c9894 into isaac-sim:develop Mar 24, 2026
21 of 28 checks passed
myurasov-nv added a commit that referenced this pull request Mar 24, 2026
)

# Description

Until now every PR ran **all** GPU test jobs regardless of what changed.
That is no-good and needs to change. :)

This PR:
- Splits tests more evenly and according to functionality
- Only runs relevant ones; manually all can still be executed
- Adds reporting of the passed/failed tests on the Workflow page
- Improves job cancellation so resources are freed faster

## Detailed changes

- Added a `detect-changes` job that runs `git diff --name-only` to
produce per-package `true`/`false` outputs; each test job skips itself
when its package is untouched. Core/infra changes still trigger
everything. Manual "run all" option available via GH UI.
- Split `test-general` into per-package jobs: `test-isaaclab` (2
shards), `test-isaaclab-rl`, `-mimic`, `-contrib`, `-teleop`,
`-visualizers`, `-assets`. Each job is independently gated and sharded.
- Replaced hardcoded `include-files` whitelists with deterministic
`shard-index % shard-count` sharding, fixing 7 silently skipped
`isaaclab_tasks` tests.
- Broke out cuRobo Planner and SkillGen into separate test jobs with
their own conditional execution gates.
- Fixed `conftest.py` exclude pattern that was matching a
comma-separated string as one substring (never matched).
- Container is now killed on cancellation signals so jobs stop quickly
via GH UI.
- JUnit results are converted to Markdown and displayed on the GH
Actions job page, including skip reasons.
- Faster code checkout and updated `actions/checkout@v4`,
`actions/upload-artifact@v4`, `actions/download-artifact@v4` to latest
SHA-pinned versions.
- Extracted reusable composite action for repeating Docker test-run
boilerplate.
- Added Docker image pull timing for diagnostics.
- Suppressed Node.js deprecation warnings in CI output.

## How test_train_environments.py is split

The test function `test_train_environments` is parametrized by
*task_spec* (all registered Isaac environments) and *workflow*
(`rl_games`, `rsl_rl`, `sb3`, `skrl`), so pytest expands it into
`N_tasks x 4_workflows` individual test items. The
`pytest_collection_modifyitems` hook in conftest.py runs after pytest
builds this full list and keeps only every Nth item based on
`TEST_SHARD_INDEX` / `TEST_SHARD_COUNT` env vars (e.g. shard 0/3 gets
items 0, 3, 6, ...). Each CI runner sees ~1/3 of the parametrized
combinations, cutting wall time from ~1.2h to ~24min per shard.


## Type of change

- New feature (non-breaking change which adds functionality)
- Bug fix (non-breaking change which fixes an issue)

## Checklist

- [x] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Signed-off-by: myurasov-nv <168484206+myurasov-nv@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Antoine Richard <antoiner@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants