Splits Tests More Evenly, Only Runs Relevant Ones, Adds Reporting by myurasov-nv · Pull Request #5059 · isaac-sim/IsaacLab

myurasov-nv · 2026-03-18T19:07:31Z

Description

Until now every PR ran all GPU test jobs regardless of what changed. That is no-good and needs to change. :)

This PR:

Splits tests more evenly and according to functionality
Only runs relevant ones; manually all can still be executed
Adds reporting of the passed/failed tests on the Workflow page
Improves job cancellation so resources are freed faster

Detailed changes

Added a detect-changes job that runs git diff --name-only to produce per-package true/false outputs; each test job skips itself when its package is untouched. Core/infra changes still trigger everything. Manual "run all" option available via GH UI.
Split test-general into per-package jobs: test-isaaclab (2 shards), test-isaaclab-rl, -mimic, -contrib, -teleop, -visualizers, -assets. Each job is independently gated and sharded.
Replaced hardcoded include-files whitelists with deterministic shard-index % shard-count sharding, fixing 7 silently skipped isaaclab_tasks tests.
Broke out cuRobo Planner and SkillGen into separate test jobs with their own conditional execution gates.
Fixed conftest.py exclude pattern that was matching a comma-separated string as one substring (never matched).
Container is now killed on cancellation signals so jobs stop quickly via GH UI.
JUnit results are converted to Markdown and displayed on the GH Actions job page, including skip reasons.
Faster code checkout and updated actions/checkout@v4, actions/upload-artifact@v4, actions/download-artifact@v4 to latest SHA-pinned versions.
Extracted reusable composite action for repeating Docker test-run boilerplate.
Added Docker image pull timing for diagnostics.
Suppressed Node.js deprecation warnings in CI output.

How test_train_environments.py is split

The test function test_train_environments is parametrized by task_spec (all registered Isaac environments) and workflow (rl_games, rsl_rl, sb3, skrl), so pytest expands it into N_tasks x 4_workflows individual test items. The pytest_collection_modifyitems hook in conftest.py runs after pytest builds this full list and keeps only every Nth item based on TEST_SHARD_INDEX / TEST_SHARD_COUNT env vars (e.g. shard 0/3 gets items 0, 3, 6, ...). Each CI runner sees ~1/3 of the parametrized combinations, cutting wall time from ~1.2h to ~24min per shard.

Type of change

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

greptile-apps · 2026-03-19T21:48:01Z

Greptile Summary

This PR substantially overhauls the CI pipeline for Isaac Lab: it introduces a lightweight detect-changes job that gates every GPU test job on per-package file-diff outputs, splits the monolithic test-general job into 10+ independently-triggered per-package jobs (several sharded 3-way), extracts a reusable run-package-tests composite action, adds $GITHUB_STEP_SUMMARY reporting via a new junit_summary.py script, and fixes a long-standing conftest bug that caused isaaclab_tasks shards to silently skip tests.

Key changes and observations:

detect-changes job: Uses git diff --name-only FETCH_HEAD HEAD with fetch-depth: 1; correctly falls back to running all tests when the diff is empty or the event is not a PR. The has() helper function is correctly written — set -e is not inherited inside command substitutions by default in bash, so grep returning 1 (no match) is handled safely by the local rc=$? capture.
3-shard parallelism for isaaclab_tasks and isaaclab (core) cuts wall time from ~1.2 h to ~24 min per shard. Sharding is deterministic (sorted file list, modulo), and the benchmarking conftest.py applies complementary test-item-level sharding for parametrized tests like test_train_environments.py.
run-package-tests composite action eliminates ~50 lines of duplicated boilerplate per job. The action no longer uploads raw JUnit XML files as GitHub Actions artifacts — results are surfaced only via dorny/test-reporter annotations and step summaries. This is a deliberate design choice but removes the ability to download raw XMLs post-run.
junit_summary.py: Correctly uses f-strings throughout. When all tests pass, there is no top-level status banner in the Step Summary — only a collapsed <details> section, which is a minor UX gap.
tools/conftest.py fix: the old exclude-pattern treated "isaaclab_tasks,isaaclab_newton,isaaclab_physx" as a single substring (never matched); the new comma-split logic works correctly. File-level sharding is correctly bypassed when include-files is set, delegating to test-item-level sharding in benchmarking/conftest.py.

Confidence Score: 4/5

PR is safe to merge; the new CI architecture is sound and the changes are non-breaking infrastructure improvements.
The core logic — change detection, sharding, composite action extraction, and JUnit reporting — is all correct. The three comments left are P2 style/UX observations (missing all-pass header, no artifact upload, and a defensive suggestion for the has() error branch) that do not block functionality. The previously raised concerns about SHA-pinning and fetch-depth have been acknowledged upstream. No runtime bugs were found in the Python, shell, or YAML code.
.github/workflows/build.yaml — the detect-changes has() helper and the overall job-gating logic are the most novel and consequential code; worth a final read-through before merge.

Important Files Changed

Filename	Overview
.github/workflows/build.yaml	Major restructuring: adds `detect-changes` job for per-package gating, splits `test-general` into 10+ per-package jobs (each independently triggered), adds 3-shard parallelism for `isaaclab_tasks` and `isaaclab (core)`, and removes the old `combine-results` pattern in favour of `dorny/test-reporter`. The `has()` helper function may silently misbehave if `inherit_errexit` is active on the runner's bash.
.github/actions/run-package-tests/action.yml	New composite action that wraps ECR pull, `run-tests`, `dorny/test-reporter` annotation, and fork-PR failure gate into a single reusable step. Removes per-job artifact uploads (test XMLs are no longer downloadable post-run). Adds pull-duration diagnostics and cancellation cleanup.
.github/actions/run-tests/action.yml	Adds shard-index/shard-count inputs and env var forwarding, removes deprecated `cuda-issue-only` input, fixes the `"not "` prefix check to avoid spurious matches on words starting with "not", adds foreground Docker run with `--init --stop-timeout 5` for faster signal handling, and improves fallback XML combination logic.
.github/actions/run-tests/junit_summary.py	New script that parses a JUnit XML and writes a Markdown summary to `$GITHUB_STEP_SUMMARY`. Correctly uses f-strings; handles failures, errors, skips, and passes with collapsible sections. Minor: when all tests pass there is no top-level status banner, only a collapsed green `<details>` block.
tools/conftest.py	Adds file-level sharding via `TEST_SHARD_INDEX`/`TEST_SHARD_COUNT` env vars; sharding is correctly bypassed when `include_files` is set (deferring to benchmarking conftest's test-item sharding). Also fixes the exclude-pattern split that previously matched a comma-separated string as a single substring.
source/isaaclab_tasks/test/benchmarking/conftest.py	Adds `pytest_collection_modifyitems` hook for test-item-level sharding of parametrized test cases (used by `test_train_environments.py`). Correctly reads `TEST_SHARD_INDEX`/`TEST_SHARD_COUNT` and filters items by modulo. Complements the file-level sharding in `tools/conftest.py`.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Push / PR / workflow_dispatch] --> DC[detect-changes\nubuntu-latest]

    DC -->|always| BUILD[build\nBuild Base Docker Image]
    DC -->|mimic/tasks/core/infra| BCUROBO[build-curobo\nBuild cuRobo Image]

    BUILD --> T1[test-isaaclab-tasks 1/3\ntasks/core/infra]
    BUILD --> T2[test-isaaclab-tasks 2/3\ntasks/core/infra]
    BUILD --> T3[test-isaaclab-tasks 3/3\ntasks/core/infra]
    BUILD --> C1[test-isaaclab-core 1/3\ncore/infra]
    BUILD --> C2[test-isaaclab-core 2/3\ncore/infra]
    BUILD --> C3[test-isaaclab-core 3/3\ncore/infra]
    BUILD --> RL[test-isaaclab-rl\nrl/core/infra]
    BUILD --> MIM[test-isaaclab-mimic\nmimic/core/infra]
    BUILD --> CON[test-isaaclab-contrib\ncontrib/core/infra]
    BUILD --> TEL[test-isaaclab-teleop\nteleop/core/infra]
    BUILD --> VIS[test-isaaclab-visualizers\nvisualizers/core/infra]
    BUILD --> ASS[test-isaaclab-assets\nassets/core/infra]
    BUILD --> NEW[test-isaaclab-newton\nnewton/core/infra]
    BUILD --> PHY[test-isaaclab-physx\nphysx/core/infra]
    BUILD --> TET[test-environments-training\ntasks/core/infra]
    BCUROBO --> CUR[test-curobo\nmimic/core/infra]
    BCUROBO --> SKL[test-skillgen\nmimic/tasks/core/infra]

    subgraph "Per-job reporting (each job)"
        RPT[run-package-tests composite\n→ ECR pull\n→ pytest in Docker\n→ dorny/test-reporter\n→ GITHUB_STEP_SUMMARY via junit_summary.py\n→ Fork PR failure gate]
    end

    T1 & T2 & T3 & C1 & C2 & C3 & RL & MIM & CON & TEL & VIS & ASS & NEW & PHY & TET & CUR & SKL --> RPT

_{Reviews (9): Last reviewed commit: "Typo in change detection - collect apps/..." | Re-trigger Greptile}

.github/actions/run-package-tests/action.yml

tools/conftest.py

.github/workflows/build.yaml

myurasov-nv · 2026-03-19T23:15:29Z

@greptile-apps

.github/actions/run-tests/junit_summary.py

.github/workflows/build.yaml

myurasov-nv · 2026-03-19T23:43:06Z

@greptile-apps

myurasov-nv · 2026-03-19T23:58:20Z

@greptile-apps

myurasov-nv · 2026-03-20T00:53:24Z

@greptile-apps

AntoineRichard · 2026-03-20T09:48:17Z

@greptile

…rs as warnings (extracted from 666a006)

This reverts commit f38ea50.

…ive (COPY merges directories rather than replacing them.)

myurasov-nv · 2026-03-21T02:06:07Z

@greptile-apps

myurasov-nv · 2026-03-21T02:25:14Z

@greptile-apps

.github/workflows/build.yaml

.github/actions/run-package-tests/action.yml

kellyguo11 · 2026-03-23T04:34:06Z

.github/actions/combine-results/action.yml

@@ -1,103 +0,0 @@
-# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).


if we are removing this job, are the test jobs going to show fail? we previously set them up to continue on failure so that this combine-results job executes.

also, where are the final reports being presented with this removed?

It seems pretty clear to me to scroll and see the pass or failure on each tests, as I am reviewing, I am also trying to understand what is the benefit of /combine-results/action.yml beside it creates one more entry of pass and fail? It seems removable for me unless I am missing some important consideration.

It was supposed to generate a full report with the exact test cases that failed and their logs. it never really worked too well though, but if we were able to get that working, then it'll save us from going into each job, scrolling to the bottom to see which tests failed and why.

but I'm more concerned about capturing the fail/pass correctly, since before we depended on the combine-results job to check the overall pass/fail

Testing jobs are displaying results individually, so that's not really needed for capturing the results. If fact it succeeds always regardless of testing jobs. It was indeed combining JUnit XML files into one. Do we need that report? My suggestion is to bring it back once we do need that report. :)

I think what kelly is suggesting is that we can go through each individual test to check the JUnit test result details when more inspection is needed, but if we have one summary report that collects all failure, it could be convenient to reference directly(think about case where we just need to copy one table and paste to github,chat,bugs and ask someone to fix rather than manually going through each tests and collect). But I do agree if you feel like this needs another PR to support, we don't have to have it in right now.

myurasov-nv · 2026-03-23T21:12:42Z

@greptile-apps

…re capturing frames (to allow for resources to load)

) # Description Until now every PR ran **all** GPU test jobs regardless of what changed. That is no-good and needs to change. :) This PR: - Splits tests more evenly and according to functionality - Only runs relevant ones; manually all can still be executed - Adds reporting of the passed/failed tests on the Workflow page - Improves job cancellation so resources are freed faster ## Detailed changes - Added a `detect-changes` job that runs `git diff --name-only` to produce per-package `true`/`false` outputs; each test job skips itself when its package is untouched. Core/infra changes still trigger everything. Manual "run all" option available via GH UI. - Split `test-general` into per-package jobs: `test-isaaclab` (2 shards), `test-isaaclab-rl`, `-mimic`, `-contrib`, `-teleop`, `-visualizers`, `-assets`. Each job is independently gated and sharded. - Replaced hardcoded `include-files` whitelists with deterministic `shard-index % shard-count` sharding, fixing 7 silently skipped `isaaclab_tasks` tests. - Broke out cuRobo Planner and SkillGen into separate test jobs with their own conditional execution gates. - Fixed `conftest.py` exclude pattern that was matching a comma-separated string as one substring (never matched). - Container is now killed on cancellation signals so jobs stop quickly via GH UI. - JUnit results are converted to Markdown and displayed on the GH Actions job page, including skip reasons. - Faster code checkout and updated `actions/checkout@v4`, `actions/upload-artifact@v4`, `actions/download-artifact@v4` to latest SHA-pinned versions. - Extracted reusable composite action for repeating Docker test-run boilerplate. - Added Docker image pull timing for diagnostics. - Suppressed Node.js deprecation warnings in CI output. ## How test_train_environments.py is split The test function `test_train_environments` is parametrized by *task_spec* (all registered Isaac environments) and *workflow* (`rl_games`, `rsl_rl`, `sb3`, `skrl`), so pytest expands it into `N_tasks x 4_workflows` individual test items. The `pytest_collection_modifyitems` hook in conftest.py runs after pytest builds this full list and keeps only every Nth item based on `TEST_SHARD_INDEX` / `TEST_SHARD_COUNT` env vars (e.g. shard 0/3 gets items 0, 3, 6, ...). Each CI runner sees ~1/3 of the parametrized combinations, cutting wall time from ~1.2h to ~24min per shard. ## Type of change - New feature (non-breaking change which adds functionality) - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: myurasov-nv <168484206+myurasov-nv@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Antoine Richard <antoiner@nvidia.com>

github-actions bot added the infrastructure label Mar 18, 2026

myurasov-nv changed the title ~~Splits Tests in More Even Portions, Only Runs Relevant Ones~~ [POC] Splits Tests in More Even Portions, Only Runs Relevant Ones Mar 18, 2026

myurasov-nv changed the title ~~[POC] Splits Tests in More Even Portions, Only Runs Relevant Ones~~ [POC] Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation Mar 18, 2026

myurasov-nv force-pushed the my-gated-tests-1 branch from 5ff7ea8 to 349d522 Compare March 19, 2026 18:35

myurasov-nv changed the title ~~[POC] Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation~~ Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting Mar 19, 2026

myurasov-nv marked this pull request as ready for review March 19, 2026 21:42

myurasov-nv requested review from hhansen-bdai and kellyguo11 as code owners March 19, 2026 21:42

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

.github/actions/run-package-tests/action.yml Outdated Show resolved Hide resolved

.github/actions/run-package-tests/action.yml Outdated Show resolved Hide resolved

tools/conftest.py Outdated Show resolved Hide resolved

.github/workflows/build.yaml Show resolved Hide resolved

github-actions bot added the isaac-lab Related to Isaac Lab team label Mar 19, 2026

myurasov-nv force-pushed the my-gated-tests-1 branch from 3f69cf3 to b83016e Compare March 19, 2026 23:13

myurasov-nv force-pushed the my-gated-tests-1 branch from 7d2c6ec to fdec5fc Compare March 19, 2026 23:22

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

.github/actions/run-tests/junit_summary.py Outdated Show resolved Hide resolved

.github/workflows/build.yaml Show resolved Hide resolved

myurasov-nv force-pushed the my-gated-tests-1 branch from daaf42a to 6b0903b Compare March 20, 2026 02:18

myurasov-nv requested review from Mayankm96 and jtigue-bdai as code owners March 20, 2026 02:18

github-actions bot added the documentation Improvements or additions to documentation label Mar 20, 2026

myurasov-nv force-pushed the my-gated-tests-1 branch from 6b0903b to 6209aaa Compare March 20, 2026 02:19

myurasov-nv requested a review from ooctipus March 20, 2026 03:50

myurasov-nv force-pushed the my-gated-tests-1 branch from 1572661 to 31947bd Compare March 20, 2026 23:17

myurasov-nv added 4 commits March 20, 2026 16:36

Trigger post-merge CI

4d431c2

Add detect-changes job for path-based selective test gating in CI

ed265ca

Split test-general into per-package jobs with sharding and exclude fix

574bf80

Replace tasks include-files whitelists with sharding to fix coverage gap

720e1e9

myurasov-nv added 3 commits March 20, 2026 17:15

Require "not " prefix with space and surface junit_summary parse erro…

91c550a

…rs as warnings (extracted from 666a006)

Revert "TMP: Skip docker rebuilds for testing"

1840dff

This reverts commit f38ea50.

Remove stale source from the deps cache so deleted modules don't surv…

f4bf738

…ive (COPY merges directories rather than replacing them.)

myurasov-nv requested a review from pascal-roth as a code owner March 21, 2026 01:40

myurasov-nv added 5 commits March 20, 2026 18:50

Use XXm:XXs format

2ffa55c

Rm unnecessary header

3b8506c

Rm redundant messages about passed/skipped tests

a928e62

Rm combine-results, we dont rly use it

305660a

Rm junit xmls uploading, we dont use those anywhere, yet

21b69d1

myurasov-nv changed the title ~~Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting~~ Splits Tests More Evenly, Only Runs Relevant Ones, Adds Reporting Mar 21, 2026

myurasov-nv added 2 commits March 20, 2026 19:09

--format

3c0a06c

Fix conditional check for no test cases found in report

9f64df5

myurasov-nv requested a review from AntoineRichard March 21, 2026 17:08

kellyguo11 reviewed Mar 23, 2026

View reviewed changes

myurasov-nv added 3 commits March 23, 2026 14:04

Detect changes in apps/ to trigger core, physx, newton, task tests

2f7a257

Revert sha pinning for 3rd-party actions to latest version tags

f9d5556

Typo in change detection - collect apps/ changes

ad85c5a

myurasov-nv requested a review from kellyguo11 March 23, 2026 21:19

myurasov-nv added 3 commits March 23, 2026 14:22

Purge /workspace/isaaclab/ completely in Dockerfile.overlay

f594751

Fix test_observation_manager.py - somple string mismatch

df97b3e

Fix test_first_frame_textured_rendering.py - warm up for 3 steps befo…

27895e6

…re capturing frames (to allow for resources to load)

ooctipus approved these changes Mar 23, 2026

View reviewed changes

myurasov-nv added 2 commits March 23, 2026 15:34

Accont for timeouts in testn timing reporting

a67cb3d

Generatye JUnit report for error-d out tests

37aa4dd

kellyguo11 mentioned this pull request Mar 24, 2026

Add lightweight CI job for pure-Python tests (no GPU) #5095

Closed

3 tasks

myurasov-nv merged commit b0c9894 into isaac-sim:develop Mar 24, 2026
21 of 28 checks passed

		@@ -1,103 +0,0 @@
		# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).

Conversation

myurasov-nv commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Detailed changes

How test_train_environments.py is split

Type of change

Checklist

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

Uh oh!

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

myurasov-nv commented Mar 20, 2026

Uh oh!

AntoineRichard commented Mar 20, 2026

Uh oh!

myurasov-nv commented Mar 21, 2026

Uh oh!

myurasov-nv commented Mar 21, 2026

Uh oh!

Uh oh!

Uh oh!

kellyguo11 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ooctipus Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kellyguo11 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

myurasov-nv Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ooctipus Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

myurasov-nv commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

myurasov-nv commented Mar 18, 2026 •

edited

Loading

greptile-apps bot commented Mar 19, 2026 •

edited

Loading

ooctipus Mar 23, 2026 •

edited

Loading

myurasov-nv Mar 23, 2026 •

edited

Loading

ooctipus Mar 23, 2026 •

edited

Loading