Skip to content

[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094

Open
shengliangxu wants to merge 62 commits intomainfrom
shengliangx/quant_cfg-list
Open

[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094
shengliangxu wants to merge 62 commits intomainfrom
shengliangx/quant_cfg-list

Conversation

@shengliangxu
Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu commented Mar 22, 2026

What does this PR do?

Summary

Redesigns the quant_cfg configuration format in ModelOpt's PyTorch quantization stack, replacing the previous dict-based format with an ordered list of typed QuantizerCfgEntry dicts.

Motivation

The old quant_cfg dict had several pain points:

  • Ambiguous precedence: no explicit way to reason about which entry wins when multiple keys match a quantizer
  • Mixed key namespaces: wildcard paths and PyTorch class names lived in the same dict level, requiring ad-hoc dispatch
  • Magic "default" key: an implicit, undocumented catch-all that was easy to misuse
  • Poor composability: merging two configs required dict updates that silently discarded keys
  • No YAML round-trip fidelity: the nested structure couldn't be expressed cleanly in YAML
New format

quant_cfg is now an ordered list of QuantizerCfgEntry TypedDicts. Each entry has:

  • quantizer_path (required): fnmatch wildcard matched against quantizer module names
  • cfg (optional): dict (or list of dicts) of QuantizerAttributeConfig fields
  • enable (optional): toggles quantizer on/off independently of cfg
  • parent_class (optional): restricts match to quantizers whose parent module is of the given PyTorch class (e.g. "nn.BatchNorm2d")

Entries are applied in list order; later entries override earlier ones. The canonical pattern is deny-all first (_base_disable_all), then selectively re-enable and configure, then apply standard exclusions (_default_disabled_quantizer_cfg).

Changes

Core library (modelopt/torch/quantization/)

  • config.py:

    • Added QuantizerCfgEntry TypedDict (line 163) and find_quant_cfg_entry_by_path() helper for exact-match lookup of entries by path.
    • Added normalize_quant_cfg_list() (line 1539) that converts legacy formats (flat dict, single-key dicts, nn.*-scoped dicts, "default" key) to canonical QuantizerCfgEntry lists. After normalization every entry is guaranteed to have explicit quantizer_path, enable, and cfg keys.
    • Converted _default_disabled_quantizer_cfg and _mamba_moe_disabled_quantizer_cfg from dicts to lists of QuantizerCfgEntry.
    • Added _base_disable_all (line 205): canonical deny-all entry ([{"quantizer_path": "*", "enable": False}]).
    • Converted all ~30 built-in config constants (INT8_DEFAULT_CFG, FP8_DEFAULT_CFG, NVFP4_DEFAULT_CFG, etc.) to list format using *_base_disable_all and *_default_disabled_quantizer_cfg unpacking.
    • KV-cache configs (FP8_KV_CFG, NVFP4_KV_CFG, etc.) are now minimal lists designed to be concatenated with a primary config — they intentionally omit _base_disable_all and "algorithm".
    • Added two QuantizeConfig Pydantic field validators: a mode="before" validator that calls normalize_quant_cfg_list(), and a mode="after" validator that validates cfg dicts against QuantizerAttributeConfig.
    • Updated need_calibration() to iterate the normalized list instead of the old dict.
    • Changed QuantizeQuantCfgType alias from dict[str | Callable, ...] to list[QuantizerCfgEntry].
  • conversion.py:

    • Rewrote set_quantizer_by_cfg() (line 217) to iterate the list directly. Each entry's parent_class is resolved via QuantModuleRegistry[parent_class_name] (the existing _DMRegistryCls registry).
    • Added set_quantizer_attributes_full() (line 314): full replacement of quantizer attributes from a QuantizerAttributeConfig. Unspecified fields revert to defaults, enforcing entry atomicity. Can also upgrade TensorQuantizerSequentialQuantizer or downgrade the reverse.
    • Added set_quantizer_attributes_partial() (line 384): merges a partial dict of attributes into existing quantizer state. Does NOT change quantizer structure. Used for enable-only entries.
    • Added set_quantizer_by_cfg_context() context manager (line 447) that temporarily applies a quant_cfg list and restores original quantizer state on exit.
    • Deprecated set_quantizer_attribute() (line 525) with a DeprecationWarning pointing to the new functions.
  • tensor_quantizer.py:

    • TensorQuantizer.set_from_attribute_config(): narrowed type hint from dict to dict[str, Any].
    • Added _axis_setter and _block_sizes_setter custom setters so that axis and block_sizes changes properly propagate to the calibrator and maintain mutual exclusivity.
    • SequentialQuantizer.set_from_attribute_config(): narrowed signature to list[QuantizerAttributeConfig] | list[dict[str, Any]] (removed the old union with single values).
  • algorithms.py:

    • Updated _match_quantizer_cfg() to iterate the list and return (matched_cfg, matched_enable) tuple with last-match-wins.
    • Updated _cfg_to_dict(), estimate_quant_compression(), and QuantRecipe to work with the list-based format.
    • Updated get_auto_quantize_config() to emit list-format quant_cfg.
  • model_quant.py: disable_quantizer() / enable_quantizer() now call set_quantizer_attributes_partial() directly instead of the deprecated set_quantizer_attribute(). Updated docstrings and code examples to show the list format.

  • utils/core_utils.py: disable_lora_quantizers_in_config() and update_quant_cfg_with_kv_cache_quant() updated to append QuantizerCfgEntry dicts to the list.

  • Other: minor updates to backends/fp8_per_tensor_gemm.py, backends/nvfp4_gemm.py, compress.py, model_calib.py, export/unified_export_hf.py, and sparsity/attention_sparsity/conversion.py to use the list format.

  • onnx/llm_export_utils/quantization_utils.py: Updated quantization config construction to use list format.

YAML recipes (modelopt_recipes/)

  • Converted all 5 general PTQ recipes to the new list format:
    • general/ptq/fp8_default-fp8_kv.yml
    • general/ptq/nvfp4_default-fp8_kv.yml
    • general/ptq/nvfp4_experts_only-fp8_kv.yml
    • general/ptq/nvfp4_mlp_only-fp8_kv.yml
    • general/ptq/nvfp4_omlp_only-fp8_kv.yml
  • Converted model-specific recipe: models/Step3.5-Flash/nvfp4-mlp-only.yaml

Documentation (docs/)

  • New guide: docs/source/guides/_quant_cfg.rst — comprehensive reference covering entry format, ordering semantics, entry atomicity, enable vs cfg independence, parent_class filtering, and common patterns (deny-all-then-enable, customizing a built-in config, building from scratch).
  • Updated _pytorch_quantization.rst code examples to show the list format with copy.deepcopy and .append().
  • Added _quant_cfg.rst to the quantization guide table of contents.

Examples

  • Updated all quantization examples to use the list format: deepseek/ptq.py, diffusers/quantization/config.py, llm_ptq/hf_ptq.py, llm_qat/main.py, vllm_serve/vllm_ptq_utils.py, llm_autodeploy/run_auto_quantize.py, llm_eval/quantization_utils.py, llm_ptq/example_utils.py, windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py, and 2 notebooks.

Tests

  • New test file: tests/unit/torch/quantization/test_config_validation.py — unit tests for need_calibration(), normalize_quant_cfg_list() (new format, legacy format conversions, error cases), find_quant_cfg_entry_by_path(), _match_quantizer_cfg(), and QuantizeConfig Pydantic validators.
  • Extended tests/unit/torch/quantization/test_quantize_cpu.py with tests for set_quantizer_attributes_full() (atomicity, parent_class filtering, SequentialQuantizer creation), list ordering, enable-only entry behavior, and end-to-end legacy dict format.
  • Updated 20+ existing test files across tests/unit/, tests/gpu/, tests/gpu_megatron/, and tests/_test_utils/ to use the list format.
Backward compatibility

normalize_quant_cfg_list() is called automatically by the QuantizeConfig Pydantic mode="before" validator, so existing code passing the old dict-based format (flat dict like {"*weight_quantizer": {"num_bits": 8}}, single-key dict lists, or nn.*-scoped dicts with parent_class semantics) continues to work without modification. The legacy "default" key is converted to quantizer_path: "*".

set_quantizer_attribute() is preserved as a deprecated wrapper around set_quantizer_attributes_partial().

Test coverage

  • Unit tests: new test_config_validation.py with tests for normalization, validation, path lookup, and cfg matching. Extended test_quantize_cpu.py with tests for full/partial attribute setting, ordering, atomicity, and legacy backward compatibility.
  • System testing:
python examples/llm_ptq/hf_ptq.py \
      --model Qwen/Qwen3-8B  \
      --recipe general/ptq/fp8_default-fp8_kv \
      --export_path=build/fp8_default-fp8_kv42  \
      --calib_size=16 \
      --batch_size=0 \
      --trust_remote_code \
      --export_fmt=hf

Additional Information

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 22, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors quantization configuration from a wildcard-keyed dict schema to an ordered list-of-entry schema (entries with quantizer_path, optional parent_class, optional cfg, optional enable) and updates runtime APIs, normalization/validation, examples, recipes, tests, backends, and docs to the new format.

Changes

Cohort / File(s) Summary
Core Config & Types
modelopt/torch/quantization/config.py
Replaced dict-based quant_cfg with list[QuantizerCfgEntry]; added QuantizerCfgEntry TypedDict, normalize_quant_cfg_list(), Pydantic validators, canonical defaults, and stricter validation (entries must include cfg or enable).
Conversion API & Attribute Setters
modelopt/torch/quantization/conversion.py
set_quantizer_by_cfg/set_quantizer_by_cfg_context now accept normalized list configs; added set_quantizer_attributes_full and set_quantizer_attributes_partial; centralized matching (_match_quantizer) and split full-vs-partial application logic.
Quantization Runtime & Model Integration
modelopt/torch/quantization/model_quant.py, modelopt/torch/quantization/model_calib.py, modelopt/torch/quantization/algorithms.py, modelopt/torch/quantization/compress.py
Adapted codepaths to read/produce list-form quant_cfg; updated estimation/compression logic to extract nested cfg; replaced set_quantizer_attribute usage with partial/full attribute setters; KV-cache and disable/enable flows updated to append list entries.
Quantizer Implementations & Utilities
modelopt/torch/quantization/nn/modules/tensor_quantizer.py, modelopt/torch/quantization/nn/modules/..., modelopt/torch/quantization/utils/core_utils.py
Adjusted type annotations and setters (axis, block_sizes) to keep calibrator state in sync; updated helper APIs to accept/return list-form config and to append KV-cache entries; added/updated public typing imports.
Backends Availability Checks
modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py, .../nvfp4_gemm.py
Switched logic to scan quant_cfg list for *input_quantizer/*weight_quantizer entries and assert extracted cfg types before validating module compatibility.
Conversion Entry Points & Export
modelopt/torch/quantization/conversion.py, modelopt/torch/export/unified_export_hf.py, modelopt/onnx/llm_export_utils/quantization_utils.py
Pass normalized list-form quant_cfg into conversion and export helpers; get_quant_config now rebuilds/appends lm_head overrides as list entries.
Examples & Notebooks
examples/... (examples/diffusers/..., examples/llm_ptq/..., examples/llm_autodeploy/run_auto_quantize.py, examples/llm_eval/..., examples/vllm_serve/fakequant_worker.py, examples/windows/..., examples/llm_qat/main.py, notebooks)`
Converted example configs and helper logic from dict-based quant_cfg to list entries; updated lookups from direct dict indexing to list traversal/search/append; adjusted a few function signatures to accept list-form KV configs.
Recipes (YAML)
modelopt_recipes/general/ptq/*.yml
Refactored recipe quant_cfg sections from key-mapped forms to ordered lists of rule objects (quantizer_path, enable, cfg, optional parent_class); replaced implicit default semantics with explicit catch-all entries.
Tests & Test Utilities
tests/_test_utils/..., tests/.../torch/quantization/*, tests/unit/torch/quantization/*, tests/gpu/...
Migrated fixtures, helpers and tests to list-based quant_cfg; updated call sites to use set_quantizer_attributes_partial/set_quantizer_attributes_full; added tests for normalize_quant_cfg_list() and new attribute-setter semantics; adapted many assertions and search logic to iterate over list entries.
Docs & Guides
docs/source/guides/1_quantization.rst, docs/source/guides/_pytorch_quantization.rst, docs/source/guides/_quant_cfg.rst
Added _quant_cfg.rst documenting the new list schema, per-entry fields, validation rules, precedence/atomicity semantics; updated PyTorch quantization guide examples to deep-copy defaults and to use list-based quant_cfg; added page to toctree.
Misc Examples / Config Constants
examples/diffusers/quantization/config.py, tests/_test_utils/...
Reworked module-level default config constants to list form; adjusted helper setters to inject trt_high_precision_dtype or calibrator configs into nested cfg entries via list traversal/appends.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.40% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: converting quant_cfg from dict-based to list-based format with design documented in _quant_cfg.rst.
Security Anti-Patterns ✅ Passed No security anti-patterns found: torch.load, numpy.load, trust_remote_code, eval/exec, or #nosec comments absent from all modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/quant_cfg-list

Comment @coderabbitai help to get the list of available commands and usage tips.

Right now the quant_cfg is a dict, but we are using the quant_cfg as if
it is a list. When we apply the quant_cfg, we enumerate the items in the
dict and apply the config one by one in
modelopt/torch/quantization/conversion.py. This implementation actually
has the semantic that the latter configs has higher precedence than the
former configs. However, dicts do not have reliable ordering.

Therefore, we make quant_cfg a list of patterns:

1. The latter config patterns have higher precedence. A latter config in
   the list overrides a fomer config if they target the same module.

2. A config to each module is atomic, each config provides the full
   information. We do not compose a quant module config from multiple
   config lines

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
…s_partial

set_quantizer_attributes_full updates the full quantizer attributes, it
has the atomic semantic

set_quantizer_attributes_partial updates just a partial set of quantizer
attributes, it has the merge semantic

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 192ea05 to fb3bb07 Compare March 23, 2026 00:19
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 86.37771% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.90%. Comparing base (df80a0f) to head (6e41b4d).

Files with missing lines Patch % Lines
modelopt/torch/quantization/conversion.py 78.78% 21 Missing ⚠️
modelopt/torch/quantization/config.py 93.80% 7 Missing ⚠️
...delopt/onnx/llm_export_utils/quantization_utils.py 50.00% 5 Missing ⚠️
modelopt/torch/quantization/algorithms.py 90.90% 5 Missing ⚠️
...torch/quantization/backends/fp8_per_tensor_gemm.py 70.00% 3 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py 66.66% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1094      +/-   ##
==========================================
+ Coverage   73.22%   75.90%   +2.68%     
==========================================
  Files         351      351              
  Lines       40072    40286     +214     
==========================================
+ Hits        29341    30579    +1238     
+ Misses      10731     9707    -1024     
Flag Coverage Δ
examples 43.86% <60.06%> (+5.30%) ⬆️
gpu 56.89% <62.53%> (-0.21%) ⬇️
unit 54.81% <76.78%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 4f38294 to 5115452 Compare March 23, 2026 03:18
@shengliangxu shengliangxu requested a review from a team as a code owner April 1, 2026 16:32
shengliangxu and others added 14 commits April 1, 2026 10:00
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@shengliangxu can you update CHANGELOG.rst as well to mention about this updated workflow. Does ModelOpt examples in Megatron-LM and Megatron-Bridge also need to be updated?

shengliangxu and others added 2 commits April 2, 2026 12:03
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu
Copy link
Copy Markdown
Collaborator Author

@shengliangxu can you update CHANGELOG.rst as well to mention about this updated workflow. Does ModelOpt examples in Megatron-LM and Megatron-Bridge also need to be updated?

Updated CHANGELOG

Do you know if the code there create custom quant configs? If they just use the existing builtin configs, then no change is needed.

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@shengliangxu can you update CHANGELOG.rst as well to mention about this updated workflow. Does ModelOpt examples in Megatron-LM and Megatron-Bridge also need to be updated?

Updated CHANGELOG

Do you know if the code there create custom quant configs? If they just use the existing builtin configs, then no change is needed.

Seems like they just use standard configs:

Copy link
Copy Markdown
Contributor

@meenchen meenchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments from AI that makes sense to me. Are there any particular files you want us to pay attention to?

shengliangxu and others added 2 commits April 3, 2026 15:16
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants