Skip to content

Remove no exp usage from logical rule Part II#3603

Open
NuojCheng wants to merge 1 commit intomainfrom
chengnuojin-no-exp2
Open

Remove no exp usage from logical rule Part II#3603
NuojCheng wants to merge 1 commit intomainfrom
chengnuojin-no-exp2

Conversation

@NuojCheng
Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng commented Apr 8, 2026

Description

Followup PR of #3578

This PR deprecates

  • activation_kv_batch_no_exp
  • activation_q_length_no_exp
  • activation_attn_length_no_exp

from logical names.

After this change

  • activation_kv_batch always includes "expert" physical axis
  • activation_q_length and activation_attn_length does not include "expert"

Other logical names containing "_no_exp" will be deprecated in following PR.

Tests

CI tests protecting attention.py.

Inference test:

NEW_MODEL_DESIGN=1 python src/maxtext/inference/vllm_decode.py src/maxtext/configs/post_train/rl.yml model_name=qwen3-30b-a3b tokenizer_path=Qwen/Qwen3-30B-A3B ici_tensor_parallelism=4 ici_expert_parallelism=1 enable_dp_attention=false hbm_utilization_vllm=0.3 load_parameters_path=gs://parambole-qwen3-moe-verification/unscanned/qwen3-30b-a3b-thinking-2507/14_08_2025/0/items vllm_hf_overrides='{architectures: ["MaxTextForCausalLM"]}' prompt="Suggest some famous landmarks in London." 2>&1 | tee  qwen3_moe_vllm_0.log

Output: https://paste.googleplex.com/6211135167660032

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@NuojCheng NuojCheng added the draft Draft PR label Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/layers/attention_op.py 75.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant