Skip to content

Add version guards for v2_27 build compatibility (#2061)#2061

Closed
lilyjanjigian wants to merge 3 commits intometa-pytorch:mainfrom
lilyjanjigian:export-D100670686
Closed

Add version guards for v2_27 build compatibility (#2061)#2061
lilyjanjigian wants to merge 3 commits intometa-pytorch:mainfrom
lilyjanjigian:export-D100670686

Conversation

@lilyjanjigian
Copy link
Copy Markdown
Contributor

@lilyjanjigian lilyjanjigian commented Apr 14, 2026

Summary:

Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time.

Fixes:

  • NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h
  • TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL
  • TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function
  • TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED
  • DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED
  • PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional
  • ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule

Reviewed By: goelayu

Differential Revision: D100670686

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 14, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 14, 2026

@lilyjanjigian has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100670686.

@meta-codesync meta-codesync bot changed the title Add version guards for v2_27 build compatibility Add version guards for v2_27 build compatibility (#2061) Apr 15, 2026
lilyjanjigian added a commit to lilyjanjigian/torchcomms that referenced this pull request Apr 15, 2026
Summary:

Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time.

Fixes:
  - NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h
  - TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL
  - TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function
  - TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED
  - DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED
  - PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional
  - ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule

Differential Revision: D100670686
lilyjanjigian added a commit to lilyjanjigian/torchcomms that referenced this pull request Apr 15, 2026
Summary:

Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time.

Fixes:
  - NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h
  - TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL
  - TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function
  - TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED
  - DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED
  - PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional
  - ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule

Reviewed By: goelayu

Differential Revision: D100670686
Lily Janjigian added 2 commits April 16, 2026 15:42
Summary: The pipes triton alltoallv module imports from torchcomms.triton.fb at module level, which fails with ImportError when torchcomms is unavailable in CI. The original import guard (triton = None) was insufficient because triton.jit and requires_torchcomms decorators execute at module-load time, causing AttributeError and NameError respectively. Replace the None stubs with no-op decorator stubs (SimpleNamespace with a passthrough jit, and a passthrough requires_torchcomms) so the module can be imported safely and tests skip gracefully via their existing TRITON_AVAILABLE / CUDA_AVAILABLE checks

Differential Revision: D100182678
… test

Summary:
The GetTopologyAssertsOnEmptyTopoData test used EXPECT_DEATH but getTopology() never aborted on empty topology data — it silently produced a topology vector with 0-length data entries. This caused the death test to either report "failed to die" or "threw an exception" depending on the execution.

Fix both sides:
1. Add CHECK_THROW_EXCEPTION validation in getTopology() to reject empty per-transport topology data, consistent with uniflow's error handling conventions (throw, not abort).
2. Change the test from EXPECT_DEATH to EXPECT_THROW(std::runtime_error) to match.

Differential Revision: D100359245
lilyjanjigian added a commit to lilyjanjigian/torchcomms that referenced this pull request Apr 16, 2026
Summary:

Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time.

Fixes:
  - NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h
  - TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL
  - TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function
  - TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED
  - DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED
  - PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional
  - ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule

Reviewed By: goelayu

Differential Revision: D100670686
Summary:
Pull Request resolved: meta-pytorch#2061

Several diffs landed over the past few months that introduced ncclx-only types (ncclWindow_t, ncclx::Hints, NCCL_FAST_INIT_MODE_RING) into torchcomms code without version guards. This broke the build when using hpc_comms.use_nccl=stable (upstream NCCL v2_27), which doesn't define these types. The ~15-20 backend_nccl and backend_gloo tests in TestX that build with this config were all failing at compile time.

Fixes:
  - NcclxApi.hpp: Replace constexpr NCCL_WIN_DEFAULT with #ifndef/#define guard to avoid collision with the macro in nccl.h
  - TorchCommNCCLXBootstrap.hpp/.cpp: Wrap ncclx::Hints and NCCL_FAST_INIT_MODE_RING usage with #ifdef NCCLX_CONFIG_SUPPORTED, with fallback paths for upstream NCCL
  - TorchCommNCCLX.cpp: Same ncclx::Hints guard in the split function
  - TorchCommWindowNCCLX.cpp: Wrap get_attr() body with #ifdef NCCL_RMA_SUPPORTED
  - DeviceBackendTraits.hpp: Conditional Window type alias (ncclWindow_t vs void*) based on NCCL_RMA_SUPPORTED
  - PipesDeviceBackend.hpp: Added NcclWin type alias with same conditional
  - ir_include/nccl.h: Added missing NCCL_RMA_SUPPORTED define to the IR stub header used by the device_window_bitcode genrule

Reviewed By: goelayu

Differential Revision: D100670686
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 17, 2026

This pull request has been merged in 13fddd5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant