[NVBug: 6000530] Fix AWQ crash for uncalibrated MoE experts by cjluo-nv · Pull Request #1142 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-03-30T20:08:47Z

Summary

Fixes NVBugs 6000530: AttributeError: 'float' object has no attribute 'pow' when running AWQ lite with moe_calib_experts_ratio < 1.0 on MoE models (e.g. Qwen3-30B-A3B).
Root cause: When moe_calib_experts_ratio=0.5, some MoE experts receive zero tokens during the AWQ cache phase, leaving act_scale as a Python float 0.0 instead of a tensor. This causes two failures:
1. Search phase crash: Uncalibrated experts crash in get_scale() because float.pow() doesn't exist.
2. Export crash: Calibrated experts have pre_quant_scale but uncalibrated ones don't, causing torch.stack() to fail on mixed None/tensor values in preprocess_linear_fusion().
Fix: Handle uncalibrated experts (num_cache_steps == 0) in two stages:
1. Before search: Disable AWQ search (is_enabled = False) to prevent get_scale() crash on float act_scale.
2. During postprocessing: Max calibrate weights and apply a neutral (all-ones) pre_quant_scale so export can stack scaling factors consistently across all experts. The pre_quant_scale buffer must be registered outside enable_weight_access_and_writeback because HF accelerate's post_forward hook drops newly-registered submodule buffers.

Test plan

Reproduce with Qwen/Qwen3-30B-A3B, --qformat int4_awq, --moe_calib_experts_ratio 0.5 — verify no crash during calibration and export

🤖 Generated with Claude Code

…rch phase When moe_calib_experts_ratio < 1.0, some MoE experts may never receive tokens during the AWQ cache phase, leaving act_scale as a Python float (0.0) instead of a tensor. During the search phase, these uncalibrated experts crash in get_scale() on float.pow(). Fix by disabling AWQ for experts with num_cache_steps == 0 before the search phase begins, so they gracefully fall back to max calibration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai · 2026-03-30T20:09:10Z

📝 Walkthrough

Walkthrough

Pre-search now skips AWQ parameter search for quantized linear modules with awq_lite.num_cache_steps == 0 by disabling AWQ; postprocess instead performs weight max calibration, sets input_quantizer.pre_quant_scale to all-ones with _enable_pre_quant_scale = True, and postprocess is only run when num_search_steps != 0.

Changes

Cohort / File(s)	Summary
AWQ Lite calibration adjustments `modelopt/torch/quantization/model_calib.py`	Added a pre-search pass that disables AWQ search for modules with `awq_lite.num_cache_steps == 0`, performs weight max calibration using `enable_weight_access_and_writeback`, sets `input_quantizer.pre_quant_scale` to all-ones and `_enable_pre_quant_scale = True`. Reorganized `postprocess` so it runs only when `num_search_steps != 0` and preserves prior behavior for `num_search_steps == 0` by disabling `awq_lite.is_enabled`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main fix: addressing an AWQ crash when dealing with uncalibrated MoE experts, directly referencing the NVBug ticket.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	PR introduces no security anti-patterns. Changes to model_calib.py are purely algorithmic modifications to handle uncalibrated MoE experts without deserialization, remote code execution, unsafe eval/exec, or new dependencies.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chenjiel/fix_awq_moe

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-30T20:12:44Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-03-31 20:55 UTC

codecov · 2026-03-30T20:21:28Z

Codecov Report

❌ Patch coverage is 38.46154% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.20%. Comparing base (f04e106) to head (23a901e).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	38.46%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1142      +/-   ##
==========================================
+ Coverage   70.19%   70.20%   +0.01%     
==========================================
  Files         230      230              
  Lines       26073    26080       +7     
==========================================
+ Hits        18302    18310       +8     
+ Misses       7771     7770       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1182-1200: The loop that handles uncalibrated experts leaves input
quantization disabled because setup() may have turned off module.input_quantizer
but postprocess is skipped when module.awq_lite.num_cache_steps == 0; modify the
block handling those modules (the for loop iterating model.named_modules(), the
branch checking is_quantized_linear(module) && hasattr(module, "awq_lite") &&
module.awq_lite.num_cache_steps == 0) to re-enable the input_quantizer state it
originally had: after setting module.input_quantizer.pre_quant_scale and before
disabling module.awq_lite.is_enabled, restore
module.input_quantizer._enable_pre_quant_scale (or call the appropriate
re-enable API on input_quantizer) to the value it had prior to setup() so
uncalibrated experts that started with input quantization enabled end up
re-enabled.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6b32c7d9-8fae-4245-9dc9-bf46dfda9d9f

📥 Commits

Reviewing files that changed from the base of the PR and between 399df39 and d17d987.

📒 Files selected for processing (1)

modelopt/torch/quantization/model_calib.py

coderabbitai · 2026-03-31T07:29:57Z

modelopt/torch/quantization/model_calib.py

+    # Handle uncalibrated experts (e.g. when moe_calib_experts_ratio < 1.0,
+    # some experts may never receive tokens during the cache phase, leaving act_scale
+    # as a Python float instead of a tensor, which would crash in get_scale()).
+    # We fully handle them here: max calibrate weights, apply a neutral (all-ones)
+    # pre_quant_scale for export consistency, and disable AWQ search.
+    for name, module in model.named_modules():
+        if (
+            is_quantized_linear(module)
+            and hasattr(module, "awq_lite")
+            and module.awq_lite.num_cache_steps == 0
+        ):
+            with enable_weight_access_and_writeback(module, model, name_to_module):
+                max_calibrate(module, lambda module: module.weight_quantizer(module.weight))
+                ones_scale = torch.ones(
+                    module.weight.shape[1], dtype=module.weight.dtype, device=module.weight.device
+                )
+                module.input_quantizer._enable_pre_quant_scale = True
+                module.input_quantizer.pre_quant_scale = ones_scale
+            module.awq_lite.is_enabled = False


⚠️ Potential issue | 🟠 Major

Missing input_quantizer re-enable for uncalibrated experts.

When setup() runs, it disables input_quantizer if it was originally enabled. For modules with num_cache_steps == 0, postprocess is skipped (lines 1234-1236), so the input_quantizer is never re-enabled. This will leave input quantization disabled for uncalibrated experts that originally had it enabled.

🐛 Proposed fix to re-enable input_quantizer

module.input_quantizer._enable_pre_quant_scale = True module.input_quantizer.pre_quant_scale = ones_scale + if module.awq_lite.is_input_quantized: + module.input_quantizer.enable() module.awq_lite.is_enabled = False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/model_calib.py` around lines 1182 - 1200, The loop that handles uncalibrated experts leaves input quantization disabled because setup() may have turned off module.input_quantizer but postprocess is skipped when module.awq_lite.num_cache_steps == 0; modify the block handling those modules (the for loop iterating model.named_modules(), the branch checking is_quantized_linear(module) && hasattr(module, "awq_lite") && module.awq_lite.num_cache_steps == 0) to re-enable the input_quantizer state it originally had: after setting module.input_quantizer.pre_quant_scale and before disabling module.awq_lite.is_enabled, restore module.input_quantizer._enable_pre_quant_scale (or call the appropriate re-enable API on input_quantizer) to the value it had prior to setup() so uncalibrated experts that started with input quantization enabled end up re-enabled.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai

♻️ Duplicate comments (1)

modelopt/torch/quantization/model_calib.py (1)

1226-1242: ⚠️ Potential issue | 🟠 Major

Restore input_quantizer in the uncalibrated-expert fallback.

AWQLiteHelper.setup() disables the input quantizer at Lines 1009-1015, and the normal re-enable path lives in postprocess() at Lines 1204-1215. Because this branch skips postprocess(), experts that started with input quantization enabled silently stay disabled after AWQ completes.

🐛 Proposed fix

                 module.input_quantizer.pre_quant_scale = torch.ones(
                     w_shape,
                     dtype=w_dtype,
                     device=w_device,
                 )
+                if module.awq_lite.is_input_quantized:
+                    module.input_quantizer.enable()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1226 - 1242, This
branch skips postprocess(), so restore the input quantizer exactly as
postprocess() does: after the weight-calibration block (inside the
uncalibrated-expert fallback), re-enable the module's input quantizer and set
its pre-quant scale state by applying the same changes postprocess() applies —
e.g. flip the input_quantizer enabled flag back on and set
module.input_quantizer._enable_pre_quant_scale = True and
module.input_quantizer.pre_quant_scale = torch.ones(...) (use
w_shape/w_dtype/w_device), mirroring AWQLiteHelper.setup and postprocess()
behavior so experts that started with input quantization enabled are re-enabled
here as well.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1226-1242: This branch skips postprocess(), so restore the input
quantizer exactly as postprocess() does: after the weight-calibration block
(inside the uncalibrated-expert fallback), re-enable the module's input
quantizer and set its pre-quant scale state by applying the same changes
postprocess() applies — e.g. flip the input_quantizer enabled flag back on and
set module.input_quantizer._enable_pre_quant_scale = True and
module.input_quantizer.pre_quant_scale = torch.ones(...) (use
w_shape/w_dtype/w_device), mirroring AWQLiteHelper.setup and postprocess()
behavior so experts that started with input quantization enabled are re-enabled
here as well.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1b2b5045-67b9-42d6-a698-d0441e711bdb

📥 Commits

Reviewing files that changed from the base of the PR and between d17d987 and 23a901e.

📒 Files selected for processing (1)

modelopt/torch/quantization/model_calib.py

realAsma

Looks great!

cjluo-nv requested a review from a team as a code owner March 30, 2026 20:08

cjluo-nv requested a review from kaix-nv March 30, 2026 20:08

cjluo-nv changed the title ~~[BugFix][6000530] Fix AWQ crash for uncalibrated MoE experts~~ [NVBug: 6000530] Fix AWQ crash for uncalibrated MoE experts Mar 30, 2026

cjluo-nv requested review from kevalmorabia97 and realAsma March 30, 2026 20:15

kevalmorabia97 added the cherry-pick After code freeze, cherry-pick into release branch for next rc. Only for bug fixes and doc updates label Mar 30, 2026

Fix

d17d987

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Fix

23a901e

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

cjluo-nv requested a review from meenchen March 31, 2026 19:11

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

realAsma approved these changes Mar 31, 2026

View reviewed changes

meenchen approved these changes Mar 31, 2026

View reviewed changes

cjluo-nv merged commit ada1e26 into main Mar 31, 2026
45 checks passed

cjluo-nv deleted the chenjiel/fix_awq_moe branch March 31, 2026 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVBug: 6000530] Fix AWQ crash for uncalibrated MoE experts#1142

[NVBug: 6000530] Fix AWQ crash for uncalibrated MoE experts#1142
cjluo-nv merged 3 commits intomainfrom
chenjiel/fix_awq_moe

cjluo-nv commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

realAsma left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cjluo-nv commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cjluo-nv commented Mar 30, 2026 •

edited

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

github-actions bot commented Mar 30, 2026 •

edited

Loading

codecov bot commented Mar 30, 2026 •

edited

Loading