Skip to content

Add experimental support for transformers>=5.0 + min torch 2.8 + bug fixes for tests#975

Open
kevalmorabia97 wants to merge 21 commits intomainfrom
kmorabi/bump-transformers-5.0
Open

Add experimental support for transformers>=5.0 + min torch 2.8 + bug fixes for tests#975
kevalmorabia97 wants to merge 21 commits intomainfrom
kmorabi/bump-transformers-5.0

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented Mar 4, 2026

What does this PR do?

  • Add experimental support for transformers >=5.0 and remove deprecated usages: https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md
  • ⚠️ For accelerate examples that used --warmup-ratio: float (deprecated in 5.x), we now change it to --warmup-steps: float | int which works as ratio if float but only for 5.x. For 4.x, it will error out if float and prompt user to change back to --warmup-ratio or pass an int absolute step count.
  • ⚠️ Unified Hugging Face checkpoint export for quantized checkpoints may not work for some models with transformers>=5.0 yet as it requires a lot of fixes (e.g. change in how MoE experts are organized)
    • Add Workaround for TRT-LLM's import of deprecated transformers functions so trt-llm based gpu unit tests work fine. Still deployment for models needs proper fixes directly in TRT-LLM hence llm/vlm ptq example tests still run with transformers 4.57
    • Everything except PTQ and Export should work fine with transformers>=5.0
  • Remove hard-coded trust_remote_code=True
  • Bump min torch to 2.8 and enable 2.11 cicd testing
  • Fix tests/examples/speculative_decoding - previously silently skipped
  • NOTE: Upcoming Nemo:26.04 container might come with transformers>=5.0

Testing

  • CI/CD tests passing
  • Manually tested unit tests, gpu tests with transformers 4.56 and 5.4
  • Manually tested example tests (except trt-llm container tests) with transformers 4.56 and 5.4
  • 2-gpu nightly CICD tests manually triggered and passing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ✅

Summary by CodeRabbit

  • New Features

    • Make remote-code usage opt-in via a configurable --trust_remote_code flag across examples and tools.
  • Bug Fixes

    • Improve checkpoint/resume detection and related training guidance to avoid erroneous errors.
  • Refactor

    • Consolidate dtype/config naming, switch warmup settings from ratio → steps, and unify tokenizer invocation patterns.
  • Documentation

    • Simplify changelog title and add misc notes for release 0.44.
  • Chores

    • Remove scheduled PR-branch cleanup workflow and relax/remove several transformers version pins.
  • Tests

    • Adjust test gates, skips, and structures to align with updated deps and behaviors.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 4, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaced many Hugging Face torch_dtype kwargs with dtype, made trust_remote_code opt-in and propagated it across examples and internals, adjusted quantization/MoE/trace internals for Transformers v5 semantics, removed a scheduled GitHub Actions workflow, and pruned pinned example requirements and some test decorators.

Changes

Cohort / File(s) Summary
Removed CI workflow
\.github/workflows/delete_outdated_pr_branches\.yml
Deleted the scheduled/manual GitHub Actions workflow that deleted outdated PR branches.
Transformers bounds & runtime warning
pyproject.toml, modelopt/torch/__init__.py, CHANGELOG.rst, tox.ini
Raised minimum transformers to >=5.0, removed prior <5.0 upper-bounds, and updated the runtime compatibility warning/patching logic for pre-5.0 behavior.
Global dtype API (torch_dtypedtype)
examples/..., modelopt/..., examples/windows/..., modelopt/onnx/...
Replaced torch_dtype with dtype in from_pretrained/model-construction callsites to align with Transformers v5 API.
Make trust_remote_code configurable
examples/gpt-oss/..., examples/llm_autodeploy/..., examples/llm_ptq/..., examples/speculative_decoding/..., examples/llm_eval/..., examples/.../vlm_utils.py, modelopt/torch/speculative/utils.py, examples/windows/...
Added trust_remote_code: bool = False CLI flags/dataclass fields and threaded the value into AutoTokenizer.from_pretrained / AutoModel.from_pretrained and related helpers instead of hardcoding True.
QTensor wrapper restore & RealQuantLinear behavior
modelopt/torch/opt/plugins/transformers.py, modelopt/torch/quantization/nn/modules/quant_linear.py, modelopt/torch/quantization/plugins/accelerate.py
Added _restore_qtensor_wrappers to re-wrap saved QTensor metadata after patched from_pretrained; updated RealQuantLinear._setup to preserve/restore existing QTensorWrapper metadata and adjusted dtype fallback/forwarding logic.
Quantization / HuggingFace plugin (DBRX/MoE) updates
modelopt/torch/quantization/plugins/huggingface.py, modelopt/torch/quantization/utils/core_utils.py
Changed DBRX expert APIs/weight shapes and _QuantDbrxExperts.forward parameter ordering; made sync_moe_expert_amax a no-op for non-iterable/batched expert containers.
Trace & speculative runtime patches
modelopt/torch/trace/plugins/transformers.py, modelopt/torch/speculative/plugins/transformers.py, modelopt/torch/speculative/utils.py
Added FX-trace-friendly BertLayer.forward patch for Transformers≥5.0, inlined DynamicCache creation where used, and made load_vlm_or_llm_with_kwargs accept trust_remote_code.
Wrapper model / past_key_values handling
modelopt/onnx/llm_export_utils/export_utils.py
Switched model loading to dtype and reconstructed past_key_values tuples from cache fields instead of using .to_legacy_cache().
Tokenizer / dataset API updates
examples/llm_sparsity/..., examples/windows/onnx_ptq/..., modelopt/torch/utils/speech_dataset_utils.py, examples/llm_eval/lm_eval_hf.py
Replaced tokenizer.batch_encode_plus(...) with direct tokenizer(...) calls; removed or consolidated trust_remote_code=True on load_dataset calls; moved import datasets out of conditionals.
Warmup/config key changes
examples/gpt-oss/configs/sft_*.yaml, examples/llm_qat/launch.sh, examples/llm_qat/notebooks/*, examples/llm_sparsity/weight_sparsity/launch_finetune.sh
Switched warmup keys/CLI args from warmup_ratio to warmup_steps.
Device placement & adapter timing
examples/llm_sparsity/attention_sparsity/hf_sa.py, modelopt/torch/quantization/plugins/transformers_trainer.py
Use .to(model.device) instead of .cuda() and moved LoRA adapter insertion earlier (before super init) using provided training args.
Finetune resume/overwrite logic
examples/llm_sparsity/weight_sparsity/finetune.py
Simplified checkpoint gating to rely on resume_from_checkpoint and removed the previous overwrite-output-dir error path.
Tests: decorators, fixtures, refactors
tests/... (multiple files)
Removed many @pytest.mark.manual markers, introduced skip/gates for transformers<5.0, refactored class-based tests to module-level functions, adjusted tiny model fixtures to use torch.float32, and skipped Whisper tests that require system deps.
Requirements & docs pruning
examples/.../requirements*.txt, examples/llm_ptq/README.md, examples/speculative_decoding/README.md
Removed or loosened several pinned dependencies (transformers pins, librosa, soundfile, deepspeed, etc.) and updated README notes/examples.
Misc examples & scripts
examples/speculative_decoding/scripts/*, examples/gpt-oss/sft.py, examples/llm_qad/*, examples/speculative_decoding/*, examples/...
Minor refactors: added/propagated trust_remote_code flags, removed redundant redeclarations, and reformatted calls to align with v5 API changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script
    participant HF as "HuggingFace.from_pretrained"
    participant ModelOpt as "ModelOpt plugin (_restore_qtensor_wrappers)"
    participant FS as "modelopt_state.pth (FS)"

    User->>Script: run (optional --trust_remote_code)
    Script->>HF: from_pretrained(..., dtype=..., trust_remote_code=...)
    HF-->>Script: returns model instance
    Script->>ModelOpt: patched hook invoked after instantiation
    ModelOpt->>FS: check for modelopt_state.pth
    FS-->>ModelOpt: q_tensor_state (if present)
    ModelOpt->>ModelOpt: re-wrap weights preserving QTensorWrapper metadata
    ModelOpt-->>Script: model with restored wrappers
    Script-->>User: continue (quantize/export/generate)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error Docstring example contains hardcoded trust_remote_code=True without justification, violating SECURITY.md requirements and contradicting PR objectives. Remove trust_remote_code=True parameter, add inline comment justifying its necessity, or make it a configurable parameter with secure False default.
Docstring Coverage ⚠️ Warning Docstring coverage is 34.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly summarizes the main objectives: upgrading transformers to 5.0, updating minimum torch to 2.8, and fixing test issues. It directly reflects the primary changes across the changeset.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabi/bump-transformers-5.0

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 3e28ada to 2b24815 Compare March 24, 2026 21:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-975/

Built to branch gh-pages at 2026-03-31 20:55 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@kevalmorabia97
Copy link
Copy Markdown
Collaborator Author

/ok to test 2b24815

@kevalmorabia97
Copy link
Copy Markdown
Collaborator Author

/ok to test 1f0726e

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 1f0726e to 48b426f Compare March 25, 2026 08:24
@kevalmorabia97
Copy link
Copy Markdown
Collaborator Author

/ok to test 48b426f

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 48b426f to 0781ac7 Compare March 25, 2026 08:42
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 41.66667% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.63%. Comparing base (74a8694) to head (d5b61cb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...lopt/torch/quantization/nn/modules/quant_linear.py 23.07% 10 Missing ⚠️
modelopt/torch/quantization/utils/core_utils.py 0.00% 2 Missing ⚠️
modelopt/torch/__init__.py 75.00% 1 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #975      +/-   ##
==========================================
+ Coverage   70.21%   70.63%   +0.41%     
==========================================
  Files         230      230              
  Lines       26073    26083      +10     
==========================================
+ Hits        18308    18423     +115     
+ Misses       7765     7660     -105     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 marked this pull request as ready for review March 25, 2026 09:41
@kevalmorabia97 kevalmorabia97 requested review from a team as code owners March 25, 2026 09:41
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from e740745 to 26cf04a Compare March 31, 2026 09:54
decoder_cls = _setup_kimi_k2_decoder()

self.eagle_config = PretrainedConfig.from_dict(config.eagle_architecture_config)
arch_config = config.eagle_architecture_config
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch 2 times, most recently from 9791ff0 to ed000d2 Compare March 31, 2026 16:38
…pu tests for torch 2.8

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from ed000d2 to 6d3af7c Compare March 31, 2026 16:42
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm_ptq LGTM

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 533e37e to 6bd8706 Compare March 31, 2026 20:43
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 6bd8706 to d5b61cb Compare March 31, 2026 20:51
@kevalmorabia97 kevalmorabia97 changed the title Add experimental support for transformers>=5.0 Add experimental support for transformers>=5.0 + Bug fixes Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants