Skip to content

Add Gemma 4 E2B/E4B support (text-only)#18695

Open
Phineas1500 wants to merge 1 commit intopytorch:mainfrom
Phineas1500:codex/gemma4-e2b-e4b-support
Open

Add Gemma 4 E2B/E4B support (text-only)#18695
Phineas1500 wants to merge 1 commit intopytorch:mainfrom
Phineas1500:codex/gemma4-e2b-e4b-support

Conversation

@Phineas1500
Copy link
Copy Markdown
Contributor

@Phineas1500 Phineas1500 commented Apr 3, 2026

Summary

Add native text-only Gemma 4 support for google/gemma-4-E2B and google/gemma-4-E4B in the ExecuTorch LLM export path.

Why

Gemma 4 E2B/E4B do not fit the existing Llama/Qwen config-only path. Supporting them required new model/runtime behavior plus a checkpoint conversion path, not just new repo IDs and JSON configs.

What Changed

  • Register gemma4_e2b and gemma4_e4b as first-class export targets.
  • Add a new examples/models/gemma4 package with configs, converter, BUCK target, and README.
  • Extend the native text runtime for Gemma 4-specific behavior, including:
    • layer-type-aware sliding/full attention
    • dual RoPE behavior
    • shared-KV reuse
    • per-layer input embeddings / scaling
    • GELU-tanh MLP support
    • post-attention and post-FFN norms
    • layer scaling and final logit softcapping
  • Carry Gemma 4 attention scaling through the custom-SDPA export path.
  • Add focused regression coverage for Gemma 4 support.
  • Add two small supporting fixes discovered during validation:
    • source-tree import cleanup in examples/models/model_factory.py
    • source-tree flatbuffer schema fallback in exir/_serialize/_flatbuffer.py

Validation

Ran:

conda activate et_pt211_clean
export PYTHONNOUSERSITE=1
export PYTHONPATH=..
python -m unittest \
  executorch.examples.models.test.test_model_factory \
  executorch.exir._serialize.test.test_flatbuffer \
  executorch.examples.models.llama.tests.test_gemma4_support \
  executorch.examples.models.qwen3_5.tests.test_convert_weights

Result: Ran 31 tests ... OK

Also validated with real HF checkpoint conversion/export/runtime smoke tests for both google/gemma-4-E2B and google/gemma-4-E4B, including broad greedy-decoding parity checks against HF.

Prompt benchmark summary:

  • E4B: exact match on 11/12 prompts, first-token match on 12/12 prompts
  • E2B: exact match on 8/12 prompts, first-token match on 10/12 prompts

The remaining E2B drift was concentrated in open-ended near-tie generations rather than structural export failures.

Not Included In This PR

  • Gemma 4 multimodal support
  • Qualcomm/QNN or other backend-specific bring-up
  • A dedicated Gemma 4 runner / example app beyond the native text export path
  • CI end-to-end export coverage with real HF weights
  • Performance or memory tuning work beyond correctness bring-up

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants