Why doesn't theoretical memory take into account the expert parallelism and the partitioning of experts? #2438

1195343015 started this conversation in General

1195343015
Dec 1, 2025

https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/training/theoretical_memory_usage.py

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment