Skip to content

chore(deps-dev): bump @huggingface/transformers from 3.8.1 to 4.0.1#831

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/npm_and_yarn/huggingface/transformers-4.0.1
Open

chore(deps-dev): bump @huggingface/transformers from 3.8.1 to 4.0.1#831
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/npm_and_yarn/huggingface/transformers-4.0.1

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Apr 4, 2026

Bumps @huggingface/transformers from 3.8.1 to 4.0.1.

Release notes

Sourced from @​huggingface/transformers's releases.

4.0.0

🚀 Transformers.js v4

We're excited to announce that Transformers.js v4 is now available on NPM! After a year of development (we started in March 2025 🤯), we're finally ready for you to use it.

npm i @huggingface/transformers

Links: YouTube Video, Blog Post, Demo Collection

New WebGPU backend

The biggest change is undoubtedly the adoption of a new WebGPU Runtime, completely rewritten in C++. We've worked closely with the ONNX Runtime team to thoroughly test this runtime across our ~200 supported model architectures, as well as many new v4-exclusive architectures.

In addition to better operator support (for performance, accuracy, and coverage), this new WebGPU runtime allows the same transformers.js code to be used across a wide variety of JavaScript environments, including browsers, server-side runtimes, and desktop applications. That's right, you can now run WebGPU-accelerated models directly in Node, Bun, and Deno!

We've proven that it's possible to run state-of-the-art AI models 100% locally in the browser, and now we're focused on performance: making these models run as fast as possible, even in resource-constrained environments. This required completely rethinking our export strategy, especially for large language models. We achieve this by re-implementing new models operation by operation, leveraging specialized ONNX Runtime Contrib Operators like com.microsoft.GroupQueryAttention, com.microsoft.MatMulNBits, or com.microsoft.QMoE to maximize performance.

For example, adopting the com.microsoft.MultiHeadAttention operator, we were able to achieve a ~4x speedup for BERT-based embedding models.

New models

Thanks to our new export strategy and ONNX Runtime's expanding support for custom operators, we've been able to add many new models and architectures to Transformers.js v4. These include popular models like GPT-OSS, Chatterbox, GraniteMoeHybrid, LFM2-MoE, HunYuanDenseV1, Apertus, Olmo3, FalconH1, and Youtu-LLM. Many of these required us to implement support for advanced architectural patterns, including Mamba (state-space models), Multi-head Latent Attention (MLA), and Mixture of Experts (MoE). Perhaps most importantly, these models are all compatible with WebGPU, allowing users to run them directly in the browser or server-side JavaScript environments with hardware acceleration. We've released several Transformers.js v4 demos so far... and we'll continue to release more!

Additionally, we've added support for larger models exceeding 8B parameters. In our tests, we've been able to run GPT-OSS 20B (q4f16) at ~60 tokens per second on an M4 Pro Max.

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [@huggingface/transformers](https://github.com/huggingface/transformers.js) from 3.8.1 to 4.0.1.
- [Release notes](https://github.com/huggingface/transformers.js/releases)
- [Commits](https://github.com/huggingface/transformers.js/commits)

---
updated-dependencies:
- dependency-name: "@huggingface/transformers"
  dependency-version: 4.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file javascript Pull requests that update javascript code labels Apr 4, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 4, 2026

Greptile Summary

This Dependabot PR bumps @huggingface/transformers from 3.8.1 to 4.0.1 across both devDependencies and peerDependencies. The jump is a major version (v3 → v4) and carries one P1 API break:

  • The quantized: true pipeline option used in src/domain/search/models.ts (line 197) was replaced by a dtype string parameter in Transformers.js v3 and is confirmed removed in v4. Without a fix, the minilm model loads in full fp32 precision instead of q8, quadrupling its memory footprint.

Confidence Score: 4/5

Unsafe to merge without fixing the quantized→dtype API change in models.ts; otherwise the upgrade is clean

One clear P1: the quantized: true option no longer has any effect in v4, causing minilm to load unquantized. All other public API surface (pipeline, feature-extraction, dispose, Xenova/ model names, output.data) is confirmed compatible. Tests mock the library so CI stays green regardless.

src/domain/search/models.ts — needs quantized: true replaced with dtype: 'q8' before merging

Important Files Changed

Filename Overview
package.json Bumps @huggingface/transformers devDependency and peerDependency from ^3.8.1 to ^4.0.1; a major-version jump that drops the legacy quantized pipeline option used in models.ts
package-lock.json Lock file updated to resolve @huggingface/transformers 4.0.1 and its new transitive dependencies; no concerns beyond the source-level API change

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[loadModel called] --> B{extractor cached?}
    B -- yes --> C[Return cached extractor]
    B -- no --> D[loadTransformers / dynamic import]
    D --> E["pipeline('feature-extraction', modelName, pipelineOpts)"]
    E --> F{config.quantized?}
    F -- "v3: {quantized:true} → loads q8 model" --> G["✅ ~23 MB"]
    F -- "v4: {quantized:true} ignored → loads fp32 model" --> H["⚠️ ~92 MB (4×)"]
    F -- "Fix: {dtype:'q8'} → loads q8 model" --> I["✅ ~23 MB"]
    G --> J[Extractor ready]
    H --> J
    I --> J
    J --> K["extractor(batch, {pooling:'mean', normalize:true})"]
    K --> L["output.data — flat Float32 array (unchanged in v4)"]
Loading

Comments Outside Diff (1)

  1. src/domain/search/models.ts, line 197 (link)

    P1 quantized option silently dropped in v4 — minilm loads unquantized

    The official Transformers.js docs state: "Before Transformers.js v3, we used the quantized option… Now, we've added the ability to select from a much larger list with the dtype parameter." With this bump to v4, { quantized: true } is no longer a recognized pipeline option and will be silently ignored, causing the minilm model to load in full fp32 precision instead of the intended q8 variant — roughly 4× the memory footprint.

Reviews (1): Last reviewed commit: "chore(deps-dev): bump @huggingface/trans..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file javascript Pull requests that update javascript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants