Skip to content

[Draft]feat(recipe): add NVFP4 QAT recipe for Qwen3-30B W4A16#36

Draft
zhangyimi wants to merge 6 commits intoverl-project:mainfrom
zhangyimi:qat
Draft

[Draft]feat(recipe): add NVFP4 QAT recipe for Qwen3-30B W4A16#36
zhangyimi wants to merge 6 commits intoverl-project:mainfrom
zhangyimi:qat

Conversation

@zhangyimi
Copy link

@zhangyimi zhangyimi commented Feb 3, 2026

Description

This PR adds support for NVFP4 Quantization-Aware Training (QAT) with FSDP, enabling W4A16 (weight-only) quantization during RL training.

What's included

verl/utils/qat/ module: QATLinear (Triton FP4 fake quantization), scale fusion, NVFP4 quantizer, and vLLM dynamic weight loading patches
Recipe scripts and configs for Qwen3-30B-A3B W4A16 (full quantization & FFN-only quantization)
Detailed README with implementation overview and experimental results

Key Results

Validated on Qwen3-8B-Base (Dense) and Qwen3-30B-A3B-Base (MoE): W4A16 QAT achieves training accuracy on par with BF16 baseline, while without QAT the KL divergence explodes and training crashes.
70.3% weight memory reduction on Qwen3-30B-A3B during rollout (56.88 GiB → 16.89 GiB), freeing ~40 GiB for additional KV Cache capacity.

VeRL PR: verl-project/verl#5190
README: https://github.com/zhangyimi/verl-recipe/blob/006aa5dabb8dac1f2369e52c3ad27455b84e7799/qat/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant