Skip to content

[Question] Is it recommended to use the same examples for SFT and QAT? #1095

@DhruvBhatia0

Description

@DhruvBhatia0

Quantizing a model to NVFP4. The model was SFT'd then RL fine-tuned. Planning to do QAT to recover accuracy after PTQ.

Looking at examples/llm_qat/, the same Daring-Anteater dataset is used across all three steps (SFT → PTQ
calibration → QAT), just different splits. A few questions:

  1. Calibration vs QAT data — Is using the exact same samples for both SFT and QAT fine-tuning
    intentional/recommended, or just a simplification for the example?

  2. RL-tuned models — My model's final training stage was RL, not SFT. Should QAT fine-tuning data
    match the RL distribution (tool calls, reasoning traces, environment feedback), or the original SFT data, or
    does it not matter much?

Who can help?

@kevalmorabia97 @sugunav14

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions