Splits Tests More Evenly, Only Runs Relevant Ones, Adds Reporting#5059
Splits Tests More Evenly, Only Runs Relevant Ones, Adds Reporting#5059myurasov-nv merged 107 commits intoisaac-sim:developfrom
Conversation
5ff7ea8 to
349d522
Compare
Greptile SummaryThis PR substantially overhauls the CI pipeline for Isaac Lab: it introduces a lightweight Key changes and observations:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Push / PR / workflow_dispatch] --> DC[detect-changes\nubuntu-latest]
DC -->|always| BUILD[build\nBuild Base Docker Image]
DC -->|mimic/tasks/core/infra| BCUROBO[build-curobo\nBuild cuRobo Image]
BUILD --> T1[test-isaaclab-tasks 1/3\ntasks/core/infra]
BUILD --> T2[test-isaaclab-tasks 2/3\ntasks/core/infra]
BUILD --> T3[test-isaaclab-tasks 3/3\ntasks/core/infra]
BUILD --> C1[test-isaaclab-core 1/3\ncore/infra]
BUILD --> C2[test-isaaclab-core 2/3\ncore/infra]
BUILD --> C3[test-isaaclab-core 3/3\ncore/infra]
BUILD --> RL[test-isaaclab-rl\nrl/core/infra]
BUILD --> MIM[test-isaaclab-mimic\nmimic/core/infra]
BUILD --> CON[test-isaaclab-contrib\ncontrib/core/infra]
BUILD --> TEL[test-isaaclab-teleop\nteleop/core/infra]
BUILD --> VIS[test-isaaclab-visualizers\nvisualizers/core/infra]
BUILD --> ASS[test-isaaclab-assets\nassets/core/infra]
BUILD --> NEW[test-isaaclab-newton\nnewton/core/infra]
BUILD --> PHY[test-isaaclab-physx\nphysx/core/infra]
BUILD --> TET[test-environments-training\ntasks/core/infra]
BCUROBO --> CUR[test-curobo\nmimic/core/infra]
BCUROBO --> SKL[test-skillgen\nmimic/tasks/core/infra]
subgraph "Per-job reporting (each job)"
RPT[run-package-tests composite\n→ ECR pull\n→ pytest in Docker\n→ dorny/test-reporter\n→ GITHUB_STEP_SUMMARY via junit_summary.py\n→ Fork PR failure gate]
end
T1 & T2 & T3 & C1 & C2 & C3 & RL & MIM & CON & TEL & VIS & ASS & NEW & PHY & TET & CUR & SKL --> RPT
Reviews (9): Last reviewed commit: "Typo in change detection - collect apps/..." | Re-trigger Greptile |
3f69cf3 to
b83016e
Compare
7d2c6ec to
fdec5fc
Compare
2 similar comments
daaf42a to
6b0903b
Compare
6b0903b to
6209aaa
Compare
|
@greptile |
1572661 to
31947bd
Compare
| @@ -1,103 +0,0 @@ | |||
| # Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). | |||
There was a problem hiding this comment.
if we are removing this job, are the test jobs going to show fail? we previously set them up to continue on failure so that this combine-results job executes.
also, where are the final reports being presented with this removed?
There was a problem hiding this comment.
It seems pretty clear to me to scroll and see the pass or failure on each tests, as I am reviewing, I am also trying to understand what is the benefit of /combine-results/action.yml beside it creates one more entry of pass and fail? It seems removable for me unless I am missing some important consideration.
There was a problem hiding this comment.
It was supposed to generate a full report with the exact test cases that failed and their logs. it never really worked too well though, but if we were able to get that working, then it'll save us from going into each job, scrolling to the bottom to see which tests failed and why.
but I'm more concerned about capturing the fail/pass correctly, since before we depended on the combine-results job to check the overall pass/fail
There was a problem hiding this comment.
Testing jobs are displaying results individually, so that's not really needed for capturing the results. If fact it succeeds always regardless of testing jobs. It was indeed combining JUnit XML files into one. Do we need that report? My suggestion is to bring it back once we do need that report. :)
There was a problem hiding this comment.
I think what kelly is suggesting is that we can go through each individual test to check the JUnit test result details when more inspection is needed, but if we have one summary report that collects all failure, it could be convenient to reference directly(think about case where we just need to copy one table and paste to github,chat,bugs and ask someone to fix rather than manually going through each tests and collect). But I do agree if you feel like this needs another PR to support, we don't have to have it in right now.
…re capturing frames (to allow for resources to load)
) # Description Until now every PR ran **all** GPU test jobs regardless of what changed. That is no-good and needs to change. :) This PR: - Splits tests more evenly and according to functionality - Only runs relevant ones; manually all can still be executed - Adds reporting of the passed/failed tests on the Workflow page - Improves job cancellation so resources are freed faster ## Detailed changes - Added a `detect-changes` job that runs `git diff --name-only` to produce per-package `true`/`false` outputs; each test job skips itself when its package is untouched. Core/infra changes still trigger everything. Manual "run all" option available via GH UI. - Split `test-general` into per-package jobs: `test-isaaclab` (2 shards), `test-isaaclab-rl`, `-mimic`, `-contrib`, `-teleop`, `-visualizers`, `-assets`. Each job is independently gated and sharded. - Replaced hardcoded `include-files` whitelists with deterministic `shard-index % shard-count` sharding, fixing 7 silently skipped `isaaclab_tasks` tests. - Broke out cuRobo Planner and SkillGen into separate test jobs with their own conditional execution gates. - Fixed `conftest.py` exclude pattern that was matching a comma-separated string as one substring (never matched). - Container is now killed on cancellation signals so jobs stop quickly via GH UI. - JUnit results are converted to Markdown and displayed on the GH Actions job page, including skip reasons. - Faster code checkout and updated `actions/checkout@v4`, `actions/upload-artifact@v4`, `actions/download-artifact@v4` to latest SHA-pinned versions. - Extracted reusable composite action for repeating Docker test-run boilerplate. - Added Docker image pull timing for diagnostics. - Suppressed Node.js deprecation warnings in CI output. ## How test_train_environments.py is split The test function `test_train_environments` is parametrized by *task_spec* (all registered Isaac environments) and *workflow* (`rl_games`, `rsl_rl`, `sb3`, `skrl`), so pytest expands it into `N_tasks x 4_workflows` individual test items. The `pytest_collection_modifyitems` hook in conftest.py runs after pytest builds this full list and keeps only every Nth item based on `TEST_SHARD_INDEX` / `TEST_SHARD_COUNT` env vars (e.g. shard 0/3 gets items 0, 3, 6, ...). Each CI runner sees ~1/3 of the parametrized combinations, cutting wall time from ~1.2h to ~24min per shard. ## Type of change - New feature (non-breaking change which adds functionality) - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: myurasov-nv <168484206+myurasov-nv@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Antoine Richard <antoiner@nvidia.com>
Description
Until now every PR ran all GPU test jobs regardless of what changed. That is no-good and needs to change. :)
This PR:
Detailed changes
detect-changesjob that runsgit diff --name-onlyto produce per-packagetrue/falseoutputs; each test job skips itself when its package is untouched. Core/infra changes still trigger everything. Manual "run all" option available via GH UI.test-generalinto per-package jobs:test-isaaclab(2 shards),test-isaaclab-rl,-mimic,-contrib,-teleop,-visualizers,-assets. Each job is independently gated and sharded.include-fileswhitelists with deterministicshard-index % shard-countsharding, fixing 7 silently skippedisaaclab_taskstests.conftest.pyexclude pattern that was matching a comma-separated string as one substring (never matched).actions/checkout@v4,actions/upload-artifact@v4,actions/download-artifact@v4to latest SHA-pinned versions.How test_train_environments.py is split
The test function
test_train_environmentsis parametrized by task_spec (all registered Isaac environments) and workflow (rl_games,rsl_rl,sb3,skrl), so pytest expands it intoN_tasks x 4_workflowsindividual test items. Thepytest_collection_modifyitemshook in conftest.py runs after pytest builds this full list and keeps only every Nth item based onTEST_SHARD_INDEX/TEST_SHARD_COUNTenv vars (e.g. shard 0/3 gets items 0, 3, 6, ...). Each CI runner sees ~1/3 of the parametrized combinations, cutting wall time from ~1.2h to ~24min per shard.Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there