[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094
[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094shengliangxu wants to merge 62 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors quantization configuration from a wildcard-keyed dict schema to an ordered list-of-entry schema (entries with Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Right now the quant_cfg is a dict, but we are using the quant_cfg as if it is a list. When we apply the quant_cfg, we enumerate the items in the dict and apply the config one by one in modelopt/torch/quantization/conversion.py. This implementation actually has the semantic that the latter configs has higher precedence than the former configs. However, dicts do not have reliable ordering. Therefore, we make quant_cfg a list of patterns: 1. The latter config patterns have higher precedence. A latter config in the list overrides a fomer config if they target the same module. 2. A config to each module is atomic, each config provides the full information. We do not compose a quant module config from multiple config lines Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
…s_partial set_quantizer_attributes_full updates the full quantizer attributes, it has the atomic semantic set_quantizer_attributes_partial updates just a partial set of quantizer attributes, it has the merge semantic Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
192ea05 to
fb3bb07
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1094 +/- ##
==========================================
+ Coverage 73.22% 75.90% +2.68%
==========================================
Files 351 351
Lines 40072 40286 +214
==========================================
+ Hits 29341 30579 +1238
+ Misses 10731 9707 -1024
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
4f38294 to
5115452
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
|
@shengliangxu can you update CHANGELOG.rst as well to mention about this updated workflow. Does ModelOpt examples in Megatron-LM and Megatron-Bridge also need to be updated? |
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Updated CHANGELOG Do you know if the code there create custom quant configs? If they just use the existing builtin configs, then no change is needed. |
Seems like they just use standard configs: |
meenchen
left a comment
There was a problem hiding this comment.
Left some comments from AI that makes sense to me. Are there any particular files you want us to pay attention to?
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
What does this PR do?
Summary
Redesigns the
quant_cfgconfiguration format in ModelOpt's PyTorch quantization stack, replacing the previous dict-based format with an ordered list of typedQuantizerCfgEntrydicts.Motivation
The old
quant_cfgdict had several pain points:"default"key: an implicit, undocumented catch-all that was easy to misuseNew format
quant_cfgis now an ordered list ofQuantizerCfgEntryTypedDicts. Each entry has:quantizer_path(required):fnmatchwildcard matched against quantizer module namescfg(optional): dict (or list of dicts) ofQuantizerAttributeConfigfieldsenable(optional): toggles quantizer on/off independently ofcfgparent_class(optional): restricts match to quantizers whose parent module is of the given PyTorch class (e.g."nn.BatchNorm2d")Entries are applied in list order; later entries override earlier ones. The canonical pattern is deny-all first (
_base_disable_all), then selectively re-enable and configure, then apply standard exclusions (_default_disabled_quantizer_cfg).Changes
Core library (
modelopt/torch/quantization/)config.py:QuantizerCfgEntryTypedDict (line 163) andfind_quant_cfg_entry_by_path()helper for exact-match lookup of entries by path.normalize_quant_cfg_list()(line 1539) that converts legacy formats (flat dict, single-key dicts,nn.*-scoped dicts,"default"key) to canonicalQuantizerCfgEntrylists. After normalization every entry is guaranteed to have explicitquantizer_path,enable, andcfgkeys._default_disabled_quantizer_cfgand_mamba_moe_disabled_quantizer_cfgfrom dicts to lists ofQuantizerCfgEntry._base_disable_all(line 205): canonical deny-all entry ([{"quantizer_path": "*", "enable": False}]).INT8_DEFAULT_CFG,FP8_DEFAULT_CFG,NVFP4_DEFAULT_CFG, etc.) to list format using*_base_disable_alland*_default_disabled_quantizer_cfgunpacking.FP8_KV_CFG,NVFP4_KV_CFG, etc.) are now minimal lists designed to be concatenated with a primary config — they intentionally omit_base_disable_alland"algorithm".QuantizeConfigPydantic field validators: amode="before"validator that callsnormalize_quant_cfg_list(), and amode="after"validator that validatescfgdicts againstQuantizerAttributeConfig.need_calibration()to iterate the normalized list instead of the old dict.QuantizeQuantCfgTypealias fromdict[str | Callable, ...]tolist[QuantizerCfgEntry].conversion.py:set_quantizer_by_cfg()(line 217) to iterate the list directly. Each entry'sparent_classis resolved viaQuantModuleRegistry[parent_class_name](the existing_DMRegistryClsregistry).set_quantizer_attributes_full()(line 314): full replacement of quantizer attributes from aQuantizerAttributeConfig. Unspecified fields revert to defaults, enforcing entry atomicity. Can also upgradeTensorQuantizer→SequentialQuantizeror downgrade the reverse.set_quantizer_attributes_partial()(line 384): merges a partialdictof attributes into existing quantizer state. Does NOT change quantizer structure. Used for enable-only entries.set_quantizer_by_cfg_context()context manager (line 447) that temporarily applies aquant_cfglist and restores original quantizer state on exit.set_quantizer_attribute()(line 525) with aDeprecationWarningpointing to the new functions.tensor_quantizer.py:TensorQuantizer.set_from_attribute_config(): narrowed type hint fromdicttodict[str, Any]._axis_setterand_block_sizes_settercustom setters so thataxisandblock_sizeschanges properly propagate to the calibrator and maintain mutual exclusivity.SequentialQuantizer.set_from_attribute_config(): narrowed signature tolist[QuantizerAttributeConfig] | list[dict[str, Any]](removed the old union with single values).algorithms.py:_match_quantizer_cfg()to iterate the list and return(matched_cfg, matched_enable)tuple with last-match-wins._cfg_to_dict(),estimate_quant_compression(), andQuantRecipeto work with the list-based format.get_auto_quantize_config()to emit list-formatquant_cfg.model_quant.py:disable_quantizer()/enable_quantizer()now callset_quantizer_attributes_partial()directly instead of the deprecatedset_quantizer_attribute(). Updated docstrings and code examples to show the list format.utils/core_utils.py:disable_lora_quantizers_in_config()andupdate_quant_cfg_with_kv_cache_quant()updated to appendQuantizerCfgEntrydicts to the list.Other: minor updates to
backends/fp8_per_tensor_gemm.py,backends/nvfp4_gemm.py,compress.py,model_calib.py,export/unified_export_hf.py, andsparsity/attention_sparsity/conversion.pyto use the list format.onnx/llm_export_utils/quantization_utils.py: Updated quantization config construction to use list format.YAML recipes (
modelopt_recipes/)general/ptq/fp8_default-fp8_kv.ymlgeneral/ptq/nvfp4_default-fp8_kv.ymlgeneral/ptq/nvfp4_experts_only-fp8_kv.ymlgeneral/ptq/nvfp4_mlp_only-fp8_kv.ymlgeneral/ptq/nvfp4_omlp_only-fp8_kv.ymlmodels/Step3.5-Flash/nvfp4-mlp-only.yamlDocumentation (
docs/)docs/source/guides/_quant_cfg.rst— comprehensive reference covering entry format, ordering semantics, entry atomicity,enablevscfgindependence,parent_classfiltering, and common patterns (deny-all-then-enable, customizing a built-in config, building from scratch)._pytorch_quantization.rstcode examples to show the list format withcopy.deepcopyand.append()._quant_cfg.rstto the quantization guide table of contents.Examples
deepseek/ptq.py,diffusers/quantization/config.py,llm_ptq/hf_ptq.py,llm_qat/main.py,vllm_serve/vllm_ptq_utils.py,llm_autodeploy/run_auto_quantize.py,llm_eval/quantization_utils.py,llm_ptq/example_utils.py,windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py, and 2 notebooks.Tests
tests/unit/torch/quantization/test_config_validation.py— unit tests forneed_calibration(),normalize_quant_cfg_list()(new format, legacy format conversions, error cases),find_quant_cfg_entry_by_path(),_match_quantizer_cfg(), andQuantizeConfigPydantic validators.tests/unit/torch/quantization/test_quantize_cpu.pywith tests forset_quantizer_attributes_full()(atomicity, parent_class filtering, SequentialQuantizer creation), list ordering, enable-only entry behavior, and end-to-end legacy dict format.tests/unit/,tests/gpu/,tests/gpu_megatron/, andtests/_test_utils/to use the list format.Backward compatibility
normalize_quant_cfg_list()is called automatically by theQuantizeConfigPydanticmode="before"validator, so existing code passing the old dict-based format (flat dict like{"*weight_quantizer": {"num_bits": 8}}, single-key dict lists, ornn.*-scoped dicts withparent_classsemantics) continues to work without modification. The legacy"default"key is converted toquantizer_path: "*".set_quantizer_attribute()is preserved as a deprecated wrapper aroundset_quantizer_attributes_partial().Test coverage
test_config_validation.pywith tests for normalization, validation, path lookup, and cfg matching. Extendedtest_quantize_cpu.pywith tests for full/partial attribute setting, ordering, atomicity, and legacy backward compatibility.Additional Information