Add experimental support for transformers>=5.0 + min torch 2.8 + bug fixes for tests by kevalmorabia97 · Pull Request #975 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-03-04T19:30:11Z

What does this PR do?

Add experimental support for transformers >=5.0 and remove deprecated usages: https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md
⚠️ For accelerate examples that used --warmup-ratio: float (deprecated in 5.x), we now change it to --warmup-steps: float | int which works as ratio if float but only for 5.x. For 4.x, it will error out if float and prompt user to change back to --warmup-ratio or pass an int absolute step count.
⚠️ Unified Hugging Face checkpoint export for quantized checkpoints may not work for some models with transformers>=5.0 yet as it requires a lot of fixes (e.g. change in how MoE experts are organized)
- Add Workaround for TRT-LLM's import of deprecated transformers functions so trt-llm based gpu unit tests work fine. Still deployment for models needs proper fixes directly in TRT-LLM hence llm/vlm ptq example tests still run with transformers 4.57
- Everything except PTQ and Export should work fine with transformers>=5.0
Remove hard-coded trust_remote_code=True
Bump min torch to 2.8 and enable 2.11 cicd testing
Fix tests/examples/speculative_decoding - previously silently skipped
NOTE: Upcoming Nemo:26.04 container might come with transformers>=5.0

Testing

CI/CD tests passing
Manually tested unit tests, gpu tests with transformers 4.56 and 5.4
Manually tested example tests (except trt-llm container tests) with transformers 4.56 and 5.4
2-gpu nightly CICD tests manually triggered and passing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅

Summary by CodeRabbit

New Features
- Make remote-code usage opt-in via a configurable --trust_remote_code flag across examples and tools.
Bug Fixes
- Improve checkpoint/resume detection and related training guidance to avoid erroneous errors.
Refactor
- Consolidate dtype/config naming, switch warmup settings from ratio → steps, and unify tokenizer invocation patterns.
Documentation
- Simplify changelog title and add misc notes for release 0.44.
Chores
- Remove scheduled PR-branch cleanup workflow and relax/remove several transformers version pins.
Tests
- Adjust test gates, skips, and structures to align with updated deps and behaviors.

copy-pr-bot · 2026-03-04T19:30:16Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-04T19:30:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Replaced many Hugging Face torch_dtype kwargs with dtype, made trust_remote_code opt-in and propagated it across examples and internals, adjusted quantization/MoE/trace internals for Transformers v5 semantics, removed a scheduled GitHub Actions workflow, and pruned pinned example requirements and some test decorators.

Changes

Cohort / File(s)	Summary
Removed CI workflow `\.github/workflows/delete_outdated_pr_branches\.yml`	Deleted the scheduled/manual GitHub Actions workflow that deleted outdated PR branches.
Transformers bounds & runtime warning `pyproject.toml`, `modelopt/torch/__init__.py`, `CHANGELOG.rst`, `tox.ini`	Raised minimum `transformers` to `>=5.0`, removed prior `<5.0` upper-bounds, and updated the runtime compatibility warning/patching logic for pre-5.0 behavior.
Global dtype API (`torch_dtype` → `dtype`) `examples/...`, `modelopt/...`, `examples/windows/...`, `modelopt/onnx/...`	Replaced `torch_dtype` with `dtype` in `from_pretrained`/model-construction callsites to align with Transformers v5 API.
Make `trust_remote_code` configurable `examples/gpt-oss/...`, `examples/llm_autodeploy/...`, `examples/llm_ptq/...`, `examples/speculative_decoding/...`, `examples/llm_eval/...`, `examples/.../vlm_utils.py`, `modelopt/torch/speculative/utils.py`, `examples/windows/...`	Added `trust_remote_code: bool = False` CLI flags/dataclass fields and threaded the value into `AutoTokenizer.from_pretrained` / `AutoModel.from_pretrained` and related helpers instead of hardcoding True.
QTensor wrapper restore & RealQuantLinear behavior `modelopt/torch/opt/plugins/transformers.py`, `modelopt/torch/quantization/nn/modules/quant_linear.py`, `modelopt/torch/quantization/plugins/accelerate.py`	Added `_restore_qtensor_wrappers` to re-wrap saved QTensor metadata after patched `from_pretrained`; updated `RealQuantLinear._setup` to preserve/restore existing `QTensorWrapper` metadata and adjusted dtype fallback/forwarding logic.
Quantization / HuggingFace plugin (DBRX/MoE) updates `modelopt/torch/quantization/plugins/huggingface.py`, `modelopt/torch/quantization/utils/core_utils.py`	Changed DBRX expert APIs/weight shapes and `_QuantDbrxExperts.forward` parameter ordering; made `sync_moe_expert_amax` a no-op for non-iterable/batched expert containers.
Trace & speculative runtime patches `modelopt/torch/trace/plugins/transformers.py`, `modelopt/torch/speculative/plugins/transformers.py`, `modelopt/torch/speculative/utils.py`	Added FX-trace-friendly `BertLayer.forward` patch for Transformers≥5.0, inlined `DynamicCache` creation where used, and made `load_vlm_or_llm_with_kwargs` accept `trust_remote_code`.
Wrapper model / past_key_values handling `modelopt/onnx/llm_export_utils/export_utils.py`	Switched model loading to `dtype` and reconstructed `past_key_values` tuples from cache fields instead of using `.to_legacy_cache()`.
Tokenizer / dataset API updates `examples/llm_sparsity/...`, `examples/windows/onnx_ptq/...`, `modelopt/torch/utils/speech_dataset_utils.py`, `examples/llm_eval/lm_eval_hf.py`	Replaced `tokenizer.batch_encode_plus(...)` with direct `tokenizer(...)` calls; removed or consolidated `trust_remote_code=True` on `load_dataset` calls; moved `import datasets` out of conditionals.
Warmup/config key changes `examples/gpt-oss/configs/sft_.yaml`, `examples/llm_qat/launch.sh`, `examples/llm_qat/notebooks/`, `examples/llm_sparsity/weight_sparsity/launch_finetune.sh`	Switched warmup keys/CLI args from `warmup_ratio` to `warmup_steps`.
Device placement & adapter timing `examples/llm_sparsity/attention_sparsity/hf_sa.py`, `modelopt/torch/quantization/plugins/transformers_trainer.py`	Use `.to(model.device)` instead of `.cuda()` and moved LoRA adapter insertion earlier (before super init) using provided training args.
Finetune resume/overwrite logic `examples/llm_sparsity/weight_sparsity/finetune.py`	Simplified checkpoint gating to rely on `resume_from_checkpoint` and removed the previous overwrite-output-dir error path.
Tests: decorators, fixtures, refactors `tests/...` (multiple files)	Removed many `@pytest.mark.manual` markers, introduced skip/gates for transformers<5.0, refactored class-based tests to module-level functions, adjusted tiny model fixtures to use `torch.float32`, and skipped Whisper tests that require system deps.
Requirements & docs pruning `examples/.../requirements*.txt`, `examples/llm_ptq/README.md`, `examples/speculative_decoding/README.md`	Removed or loosened several pinned dependencies (`transformers` pins, `librosa`, `soundfile`, `deepspeed`, etc.) and updated README notes/examples.
Misc examples & scripts `examples/speculative_decoding/scripts/`, `examples/gpt-oss/sft.py`, `examples/llm_qad/`, `examples/speculative_decoding/*`, `examples/...`	Minor refactors: added/propagated `trust_remote_code` flags, removed redundant redeclarations, and reformatted calls to align with v5 API changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script
    participant HF as "HuggingFace.from_pretrained"
    participant ModelOpt as "ModelOpt plugin (_restore_qtensor_wrappers)"
    participant FS as "modelopt_state.pth (FS)"

    User->>Script: run (optional --trust_remote_code)
    Script->>HF: from_pretrained(..., dtype=..., trust_remote_code=...)
    HF-->>Script: returns model instance
    Script->>ModelOpt: patched hook invoked after instantiation
    ModelOpt->>FS: check for modelopt_state.pth
    FS-->>ModelOpt: q_tensor_state (if present)
    ModelOpt->>ModelOpt: re-wrap weights preserving QTensorWrapper metadata
    ModelOpt-->>Script: model with restored wrappers
    Script-->>User: continue (quantize/export/generate)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	Docstring example contains hardcoded trust_remote_code=True without justification, violating SECURITY.md requirements and contradicting PR objectives.	Remove trust_remote_code=True parameter, add inline comment justifying its necessity, or make it a configurable parameter with secure False default.
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly summarizes the main objectives: upgrading transformers to 5.0, updating minimum torch to 2.8, and fixing test issues. It directly reflects the primary changes across the changeset.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabi/bump-transformers-5.0

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-24T21:38:07Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-975/
Built to branch `gh-pages` at 2026-03-31 20:55 UTC. Preview will be ready when the GitHub Pages deployment is complete.

kevalmorabia97 · 2026-03-24T21:40:16Z

/ok to test 2b24815

kevalmorabia97 · 2026-03-24T22:25:01Z

/ok to test 1f0726e

kevalmorabia97 · 2026-03-25T08:24:52Z

/ok to test 48b426f

codecov · 2026-03-25T08:55:44Z

Codecov Report

❌ Patch coverage is 41.66667% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.63%. Comparing base (74a8694) to head (d5b61cb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...lopt/torch/quantization/nn/modules/quant_linear.py	23.07%	10 Missing ⚠️
modelopt/torch/quantization/utils/core_utils.py	0.00%	2 Missing ⚠️
modelopt/torch/__init__.py	75.00%	1 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #975      +/-   ##
==========================================
+ Coverage   70.21%   70.63%   +0.41%     
==========================================
  Files         230      230              
  Lines       26073    26083      +10     
==========================================
+ Hits        18308    18423     +115     
+ Misses       7765     7660     -105

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

examples/gpt-oss/requirements.txt

kevalmorabia97 · 2026-03-31T16:12:11Z

modelopt/torch/speculative/plugins/transformers.py

            decoder_cls = _setup_kimi_k2_decoder()

-        self.eagle_config = PretrainedConfig.from_dict(config.eagle_architecture_config)
+        arch_config = config.eagle_architecture_config


@h-guo18 @yeyu-nvidia PTAL

…pu tests for torch 2.8 Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

cjluo-nv

llm_ptq LGTM

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 3e28ada to 2b24815 Compare March 24, 2026 21:34

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 1f0726e to 48b426f Compare March 25, 2026 08:24

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 48b426f to 0781ac7 Compare March 25, 2026 08:42

kevalmorabia97 marked this pull request as ready for review March 25, 2026 09:41

kevalmorabia97 requested review from a team as code owners March 25, 2026 09:41

kevalmorabia97 requested review from realAsma and ynankani March 25, 2026 09:41

kevalmorabia97 added 15 commits March 31, 2026 02:54

Fix Bert and DBRX unit tests

bdaa515

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Fix transformers load and test_llm_qat

c72454c

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Remove tokenizer.batch_encode_plus

46348a0

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Remove deprecated transformers arguments

1d9155b

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Rename torch_dtype to dtype

ee51fd7

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Remove hard-coded trust_remote_code=True

aa6c3ce

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Fix unit tests

7343c4f

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Enable some quantizer manual tests

31efc36

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

fix test

f69d9fa

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Set min transformers 5.0

2dc3140

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Fix more tests

b37545b

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Fix for TRT-LLM

1024528

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Let PTQ example tests run with transformers<5.0

1e45639

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

fix tests

38e26e3

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

minor fixes

26cf04a

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from e740745 to 26cf04a Compare March 31, 2026 09:54

jenchen13 reviewed Mar 31, 2026

View reviewed changes

examples/gpt-oss/requirements.txt Show resolved Hide resolved

kevalmorabia97 commented Mar 31, 2026

View reviewed changes

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch 2 times, most recently from 9791ff0 to ed000d2 Compare March 31, 2026 16:38

Remove transformers 5.0 compatibility patch for trtllm; disable MOE c…

6d3af7c

…pu tests for torch 2.8 Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from ed000d2 to 6d3af7c Compare March 31, 2026 16:42

cjluo-nv approved these changes Mar 31, 2026

View reviewed changes

kevalmorabia97 requested review from jenchen13 and shengliangxu March 31, 2026 19:17

fix for cppimport container test

f707ce5

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 533e37e to 6bd8706 Compare March 31, 2026 20:43

Fix spec dec example tests

d5b61cb

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 6bd8706 to d5b61cb Compare March 31, 2026 20:51

kevalmorabia97 changed the title ~~Add experimental support for transformers>=5.0~~ Add experimental support for transformers>=5.0 + Bug fixes Mar 31, 2026

Conversation

kevalmorabia97 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 4, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning)

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-03-31 20:55 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

kevalmorabia97 commented Mar 24, 2026

Uh oh!

kevalmorabia97 commented Mar 24, 2026

Uh oh!

kevalmorabia97 commented Mar 25, 2026

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

kevalmorabia97 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kevalmorabia97 commented Mar 4, 2026 •

edited

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-03-31 20:55 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov bot commented Mar 25, 2026 •

edited

Loading