Skip to content

Conversation

@kip-cxj
Copy link
Contributor

@kip-cxj kip-cxj commented Nov 3, 2025

P2P mode adaptation

We have successfully adapted the P2P mode and validated it by running the existing examples from the current project in a two-node scenario.

Status

✅ Implementation
✅ Testing

Prerequisites

  1. This implementation uses the same environment configuration as the previous PR:[Hardware] broadcast support for Huawei Ascend NPU #39

  2. Need to install Ascend Direct Transport in Mooncake. Ascend device cannot install transfer engine via pip, requires source compilation.

Limitations and Future Work

  1. A temporary sleep has been added to circumvent a mutual lock issue caused by current Ascend hardware constraints. We plan to resolve this natively on Ascend in future updates.
  2. We intend to create an example where the complete model weights reside on Server 0, while vLLM is deployed on Server 1. The Checkpoint Engine will be utilized to synchronize weights from Server 0 to Server 1.

@kip-cxj kip-cxj changed the title [Draft] p2p support for Huawei Ascend NPU [Hardware] p2p support for Huawei Ascend NPU Nov 13, 2025
@hanhan-networking
Copy link

@weixiao-huang Please review this pr.

@weixiao-huang
Copy link
Collaborator

Plz resolve conflict from main branch. Thanks

@kip-cxj kip-cxj force-pushed the main branch 4 times, most recently from 52410ac to 96483de Compare November 19, 2025 10:11
@weixiao-huang weixiao-huang merged commit aded85b into MoonshotAI:main Nov 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants