Skip to content

Conversation

@15050188022
Copy link

Sometimes the number of RDMA network cards is greater than that of GPU cards. Therefore, it is necessary to modify the assert here; otherwise, it will throw an error.

image image

@specture724
Copy link
Collaborator

You can set NCCL_IB_HCA env to choose only to use some of IB cards

@specture724
Copy link
Collaborator

See #54. We added log to inform when such assertion error raises

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants