Add Gemma 4 E2B/E4B support (text-only) by Phineas1500 · Pull Request #18695 · pytorch/executorch

Phineas1500 · 2026-04-03T22:54:45Z

Summary

Add native text-only Gemma 4 support for google/gemma-4-E2B and google/gemma-4-E4B in the ExecuTorch LLM export path.

Why

Gemma 4 E2B/E4B do not fit the existing Llama/Qwen config-only path. Supporting them required new model/runtime behavior plus a checkpoint conversion path, not just new repo IDs and JSON configs.

What Changed

Register gemma4_e2b and gemma4_e4b as first-class export targets.
Add a new examples/models/gemma4 package with configs, converter, BUCK target, and README.
Extend the native text runtime for Gemma 4-specific behavior, including:
- layer-type-aware sliding/full attention
- dual RoPE behavior
- shared-KV reuse
- per-layer input embeddings / scaling
- GELU-tanh MLP support
- post-attention and post-FFN norms
- layer scaling and final logit softcapping
Carry Gemma 4 attention scaling through the custom-SDPA export path.
Add focused regression coverage for Gemma 4 support.
Add two small supporting fixes discovered during validation:
- source-tree import cleanup in examples/models/model_factory.py
- source-tree flatbuffer schema fallback in exir/_serialize/_flatbuffer.py

Validation

Ran:

conda activate et_pt211_clean
export PYTHONNOUSERSITE=1
export PYTHONPATH=..
python -m unittest \
  executorch.examples.models.test.test_model_factory \
  executorch.exir._serialize.test.test_flatbuffer \
  executorch.examples.models.llama.tests.test_gemma4_support \
  executorch.examples.models.qwen3_5.tests.test_convert_weights

Result: Ran 31 tests ... OK

Also validated with real HF checkpoint conversion/export/runtime smoke tests for both google/gemma-4-E2B and google/gemma-4-E4B, including broad greedy-decoding parity checks against HF.

Prompt benchmark summary:

E4B: exact match on 11/12 prompts, first-token match on 12/12 prompts
E2B: exact match on 8/12 prompts, first-token match on 10/12 prompts

The remaining E2B drift was concentrated in open-ended near-tie generations rather than structural export failures.

Not Included In This PR

Gemma 4 multimodal support
Qualcomm/QNN or other backend-specific bring-up
A dedicated Gemma 4 runner / example app beyond the native text export path
CI end-to-end export coverage with real HF weights
Performance or memory tuning work beyond correctness bring-up

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 4 E2B/E4B support (text-only)#18695

Add Gemma 4 E2B/E4B support (text-only)#18695
Phineas1500 wants to merge 1 commit intopytorch:mainfrom
Phineas1500:codex/gemma4-e2b-e4b-support

Phineas1500 commented Apr 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Phineas1500 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What Changed

Validation

Not Included In This PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Phineas1500 commented Apr 3, 2026 •

edited

Loading