Merged
Conversation
#2820) * Compute swizzle_idx once per thread and pass into ComputeKernel. Signed-off-by: Cael Ling <caell@nvidia.com> * one __syncthreads per stage in GroupHadamardAmaxTmaKernel Signed-off-by: Cael Ling <caell@nvidia.com> * streamline group Hadamard ComputeKernel loads Signed-off-by: Cael Ling <caell@nvidia.com> * streamline group Hadamard ComputeKernel loads Signed-off-by: Cael Ling <caell@nvidia.com> * streamline group Hadamard ComputeKernel loads Signed-off-by: Cael Ling <caell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * one __syncthreads per stage in GroupHadamardAmaxTmaKernel Signed-off-by: Cael Ling <caell@nvidia.com> Made-with: Cursor * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compute swizzle_idx once per thread and pass into ComputeKernel. Signed-off-by: Cael Ling <caell@nvidia.com> * Fix kReturnIdentityAmax path Signed-off-by: Cael Ling <caell@nvidia.com> * Fix kReturnIdentityAmax path Signed-off-by: Cael Ling <caell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply the change to other variants Signed-off-by: Cael Ling <caell@nvidia.com> * Refactor the change to other variants Signed-off-by: Cael Ling <caell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor the change to other variants Signed-off-by: Cael Ling <caell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor the ldmatrix logics Signed-off-by: Cael Ling <caell@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Cael Ling <caell@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…ter CI error re… (#2802) * Capture subprocess stderr in distributed tests for better CI error reporting Distributed tests launch subprocesses via torch.distributed.launch/torchrun. When these fail, pytest only captures the CalledProcessError from the parent process, not the actual worker traceback. This makes CI JUnit XML reports show "exit code 1" with no useful error detail. Add run_distributed() utility to tests/pytorch/utils.py that captures stderr while letting stdout stream to the terminal. On failure, the worker's stderr (containing the actual Python traceback) is included in the AssertionError, which pytest writes into the JUnit XML report. Behavior: - Interactive use: stdout streams in real time (unchanged), stderr shown on failure - CI/JUnit XML: failure reports now include the actual worker traceback Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * Add JUnit XML output to ctest in L0_cppunittest Add --output-junit flag so ctest writes JUnit XML to /logs/, matching the pattern used by pytest tests. The XML is written before ctest exits, so it's captured even on test failure. Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
* Add unswizzling functions for scaling factors in swizzle module - Introduced `nvte_unswizzle_scaling_factors` to convert swizzled scaling factors back to row-major format. - Implemented `regs_unshuffle_with_bit_shifts` and `regs_unshuffle` for unshuffling operations in CUDA kernels. - Added `unswizzle_row_scaling_kernel_impl` and `unswizzle_col_scaling_kernel_impl` for handling unswizzling in row and column scaling respectively. These changes enhance the functionality of the swizzle module, enabling better handling of scaling factors in tensor operations. Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Add swizzle/unswizzle roundtrip test for scaling factors These enhancements tests the changes introduced for unswizzling Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Added another unswizzling functionality test for scaling factors - Introduced `compute_ref_unswizzle` to handle the conversion of swizzled scaling factors back to their original format. - Added `performTestUnswizzle1D` to validate the unswizzling process with various scaling modes. - Created `UnswizzleTestSuite` for comprehensive testing of unswizzling operations. Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Moved swizzle_row_scaling_kernel implementation at its original place - Moved the definition of `swizzle_row_scaling_kernel` to a new location for better organization. - Ensured the kernel implementation is now properly defined and accessible for scaling operations in the swizzle module. Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Add multi-tensor unswizzling functions for scaling factors - Introduced `multi_tensor_unswizzle_scaling_factors` to convert swizzled scaling factors back to their original row-major format. - Implemented CUDA kernels for unswizzling in both row and column scaling, enhancing the swizzle module's functionality. - Updated the launch function to handle multiple tensor unswizzling operations efficiently. These changes improve the handling of scaling factors in tensor operations, ensuring better performance and organization within the swizzle module. Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added greptile suggestions Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed unused check from tests and reading input directly as const rather than casting Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor unswizzling functions and update test cases for scaling factors - Updated unswizzling kernel implementations to remove original_M and original_K parameters, simplifying the function signatures. - Enhanced test suite to utilize new unswizzling data shapes, ensuring comprehensive coverage of aligned and padded cases. These changes improve the clarity and efficiency of the unswizzling process in the swizzle module. Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor unswizzling scaling factors to use a launch function Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Change unswizzling to use output as gt. Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Refactor unswizzling scaling factors to improve input validation and streamline processing. Need to check if rowwise and columnwise both can be true. If yes the if else needs to account for that Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Fix multi_tensor_unswizzle_scaling_factors to correctly reference output tensors for scaling mode and data validation. Updated checks for input and output tensor shapes to ensure proper handling of row-wise and column-wise scaling factors. Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Enhance swizzle tests and unswizzling validation Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Fix typos and update validation checks in swizzle.cu Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Update validation checks in multi_tensor_unswizzle_scaling_factors to use input numel Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * Typo Signed-off-by: Abhishek <abhi.dtu11@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishek <abhi.dtu11@gmail.com> Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Gaetan Lepage <gaetan@glepage.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )