Add version guards for v2_27 build compatibility (#2061)#2061
Closed
lilyjanjigian wants to merge 3 commits intometa-pytorch:mainfrom
Closed
Add version guards for v2_27 build compatibility (#2061)#2061lilyjanjigian wants to merge 3 commits intometa-pytorch:mainfrom
lilyjanjigian wants to merge 3 commits intometa-pytorch:mainfrom
Conversation
Contributor
|
@lilyjanjigian has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100670686. |
c30cb00 to
26b16ed
Compare
lilyjanjigian
added a commit
to lilyjanjigian/torchcomms
that referenced
this pull request
Apr 15, 2026
Summary: Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time. Fixes: - NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h - TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL - TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function - TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED - DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED - PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional - ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule Differential Revision: D100670686
lilyjanjigian
added a commit
to lilyjanjigian/torchcomms
that referenced
this pull request
Apr 15, 2026
Summary: Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time. Fixes: - NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h - TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL - TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function - TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED - DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED - PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional - ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule Reviewed By: goelayu Differential Revision: D100670686
added 2 commits
April 16, 2026 15:42
Summary: The pipes triton alltoallv module imports from torchcomms.triton.fb at module level, which fails with ImportError when torchcomms is unavailable in CI. The original import guard (triton = None) was insufficient because triton.jit and requires_torchcomms decorators execute at module-load time, causing AttributeError and NameError respectively. Replace the None stubs with no-op decorator stubs (SimpleNamespace with a passthrough jit, and a passthrough requires_torchcomms) so the module can be imported safely and tests skip gracefully via their existing TRITON_AVAILABLE / CUDA_AVAILABLE checks Differential Revision: D100182678
… test Summary: The GetTopologyAssertsOnEmptyTopoData test used EXPECT_DEATH but getTopology() never aborted on empty topology data — it silently produced a topology vector with 0-length data entries. This caused the death test to either report "failed to die" or "threw an exception" depending on the execution. Fix both sides: 1. Add CHECK_THROW_EXCEPTION validation in getTopology() to reject empty per-transport topology data, consistent with uniflow's error handling conventions (throw, not abort). 2. Change the test from EXPECT_DEATH to EXPECT_THROW(std::runtime_error) to match. Differential Revision: D100359245
26b16ed to
27d0723
Compare
lilyjanjigian
added a commit
to lilyjanjigian/torchcomms
that referenced
this pull request
Apr 16, 2026
Summary: Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time. Fixes: - NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h - TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL - TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function - TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED - DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED - PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional - ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule Reviewed By: goelayu Differential Revision: D100670686
Summary: Pull Request resolved: meta-pytorch#2061 Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time. Fixes: - NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h - TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL - TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function - TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED - DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED - PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional - ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule Reviewed By: goelayu Differential Revision: D100670686
27d0723 to
1d58c34
Compare
Contributor
|
This pull request has been merged in 13fddd5. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time.
Fixes:
Reviewed By: goelayu
Differential Revision: D100670686