Skip to content

[pull] main from NVIDIA:main#545

Merged
pull[bot] merged 4 commits intophu0ngng:mainfrom
NVIDIA:main
Apr 3, 2026
Merged

[pull] main from NVIDIA:main#545
pull[bot] merged 4 commits intophu0ngng:mainfrom
NVIDIA:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 3, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

cael-ling and others added 4 commits April 3, 2026 11:32
#2820)

* Compute swizzle_idx once per thread and pass into ComputeKernel.

Signed-off-by: Cael Ling <caell@nvidia.com>

* one __syncthreads per stage in GroupHadamardAmaxTmaKernel

Signed-off-by: Cael Ling <caell@nvidia.com>

* streamline group Hadamard ComputeKernel loads

Signed-off-by: Cael Ling <caell@nvidia.com>

* streamline group Hadamard ComputeKernel loads

Signed-off-by: Cael Ling <caell@nvidia.com>

* streamline group Hadamard ComputeKernel loads

Signed-off-by: Cael Ling <caell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* one __syncthreads per stage in GroupHadamardAmaxTmaKernel

Signed-off-by: Cael Ling <caell@nvidia.com>
Made-with: Cursor

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compute swizzle_idx once per thread and pass into ComputeKernel.

Signed-off-by: Cael Ling <caell@nvidia.com>

* Fix kReturnIdentityAmax path

Signed-off-by: Cael Ling <caell@nvidia.com>

* Fix kReturnIdentityAmax path

Signed-off-by: Cael Ling <caell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply the change to other variants

Signed-off-by: Cael Ling <caell@nvidia.com>

* Refactor the change to other variants

Signed-off-by: Cael Ling <caell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor the change to other variants

Signed-off-by: Cael Ling <caell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor the ldmatrix logics

Signed-off-by: Cael Ling <caell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Cael Ling <caell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…ter CI error re… (#2802)

* Capture subprocess stderr in distributed tests for better CI error reporting

Distributed tests launch subprocesses via torch.distributed.launch/torchrun.
When these fail, pytest only captures the CalledProcessError from the parent
process, not the actual worker traceback. This makes CI JUnit XML reports
show "exit code 1" with no useful error detail.

Add run_distributed() utility to tests/pytorch/utils.py that captures stderr
while letting stdout stream to the terminal. On failure, the worker's stderr
(containing the actual Python traceback) is included in the AssertionError,
which pytest writes into the JUnit XML report.

Behavior:
- Interactive use: stdout streams in real time (unchanged), stderr shown on failure
- CI/JUnit XML: failure reports now include the actual worker traceback

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* Add JUnit XML output to ctest in L0_cppunittest

Add --output-junit flag so ctest writes JUnit XML to /logs/,
matching the pattern used by pytest tests. The XML is written
before ctest exits, so it's captured even on test failure.

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

---------

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
* Add unswizzling functions for scaling factors in swizzle module

- Introduced `nvte_unswizzle_scaling_factors` to convert swizzled scaling factors back to row-major format.
- Implemented `regs_unshuffle_with_bit_shifts` and `regs_unshuffle` for unshuffling operations in CUDA kernels.
- Added `unswizzle_row_scaling_kernel_impl` and `unswizzle_col_scaling_kernel_impl` for handling unswizzling in row and column scaling respectively.

These changes enhance the functionality of the swizzle module, enabling better handling of scaling factors in tensor operations.

Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Add swizzle/unswizzle roundtrip test for scaling factors
These enhancements tests the changes introduced for unswizzling

Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Added another unswizzling functionality test for scaling factors

- Introduced `compute_ref_unswizzle` to handle the conversion of swizzled scaling factors back to their original format.
- Added `performTestUnswizzle1D` to validate the unswizzling process with various scaling modes.
- Created `UnswizzleTestSuite` for comprehensive testing of unswizzling operations.

Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Moved swizzle_row_scaling_kernel implementation at its original place

- Moved the definition of `swizzle_row_scaling_kernel` to a new location for better organization.
- Ensured the kernel implementation is now properly defined and accessible for scaling operations in the swizzle module.

Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Add multi-tensor unswizzling functions for scaling factors

- Introduced `multi_tensor_unswizzle_scaling_factors` to convert swizzled scaling factors back to their original row-major format.
- Implemented CUDA kernels for unswizzling in both row and column scaling, enhancing the swizzle module's functionality.
- Updated the launch function to handle multiple tensor unswizzling operations efficiently.

These changes improve the handling of scaling factors in tensor operations, ensuring better performance and organization within the swizzle module.

Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added greptile suggestions

Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed unused check from tests and reading input directly as const rather than casting
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor unswizzling functions and update test cases for scaling factors

- Updated unswizzling kernel implementations to remove original_M and original_K parameters, simplifying the function signatures.
- Enhanced test suite to utilize new unswizzling data shapes, ensuring comprehensive coverage of aligned and padded cases.

These changes improve the clarity and efficiency of the unswizzling process in the swizzle module.
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor unswizzling scaling factors to use a launch function
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Change unswizzling to use output as gt.
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Refactor unswizzling scaling factors to improve input validation and streamline processing. Need to check if rowwise and columnwise both can be true. If yes the if else needs to account for that
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Fix multi_tensor_unswizzle_scaling_factors to correctly reference output tensors for scaling mode and data validation. Updated checks for input and output tensor shapes to ensure proper handling of row-wise and column-wise scaling factors.
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Enhance swizzle tests and unswizzling validation
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Fix typos and update validation checks in swizzle.cu
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Update validation checks in multi_tensor_unswizzle_scaling_factors to use input numel
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* Typo
Signed-off-by: Abhishek <abhi.dtu11@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishek <abhi.dtu11@gmail.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Gaetan Lepage <gaetan@glepage.com>
@pull pull bot locked and limited conversation to collaborators Apr 3, 2026
@pull pull bot added the ⤵️ pull label Apr 3, 2026
@pull pull bot merged commit e83c097 into phu0ngng:main Apr 3, 2026
@pull pull bot had a problem deploying to github-pages April 3, 2026 22:33 Failure
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants