Skip to content

Conversation

@fy1214
Copy link
Contributor

@fy1214 fy1214 commented Jan 27, 2026

WIP, support the nvfp4 for Slime RL process.

@zianglih
Copy link

zianglih commented Feb 9, 2026

If using nvfp4 for training fprop + wgrad + dgrad, it's better to stick to the original TE nvfp4 recipe (https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#NVFP4-training-recipe), 2d weight (for better preserving chain rule) + Random Hadamard transforms + stochastic rounding.

Additionally, I have a TE PR (NVIDIA/TransformerEngine#2644) which allows using only nvfp4 for fprop while keeping wgrad and dgrad in bf16. In this case since we no longer use nvfp4 for bwd, we don't need 2d weight to better preserve chain rule and 1d weight has lower quantization error for free. Since the bwd is not in nvfp4, Random Hadamard transforms and stochastic rounding can also be disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants