Fix qwen2 vllm_mapping for tunix by ChingTsai · Pull Request #3588 · AI-Hypercomputer/maxtext

ChingTsai · 2026-04-07T10:26:12Z

Description

Update the vLLM mapping to match tpu-inference, similar to the changes in this Qwen3 PR.

FIXES: b/495594907

Tests

import transformers
from maxtext.utils import model_creation_utils
from maxtext.utils import max_logging
from maxtext.configs import pyconfig
from maxtext.integration.tunix.tunix_adapter import TunixMaxTextAdapter
from tunix.rl.rollout import base_rollout
from tunix.rl.rollout.vllm_rollout import VllmRollout

def get_maxtext_model(config, devices=None):
  model, mesh = model_creation_utils.create_nnx_model(config, devices=devices)
  with mesh:
    use_no_op_mappings = "maxtext_config" in config.vllm_additional_config
    tunix_model = TunixMaxTextAdapter(base_model=model, use_no_op_mappings=use_no_op_mappings)
    tunix_model.config = None
  return tunix_model, mesh

# Hardcoded arguments for pyconfig
init_argv = ["", 
    "src/maxtext/configs/base.yml",
    "model_name=qwen2.5-1.5b",
    "tokenizer_path=Qwen/Qwen2.5-1.5B-Instruct",
    "load_parameters_path=gs://xxxxx",
    "prompt=What are some famous landmarks in Paris?",
    "use_chat_template=True",
    "max_target_length=1024",
    "hbm_utilization_vllm=0.8",
]
config = pyconfig.initialize(init_argv)
maxtext_model, mesh =get_maxtext_model(config)
tokenizer = transformers.AutoTokenizer.from_pretrained(
    config.tokenizer_path,
    token=config.hf_access_token,
    model_max_length=config.max_target_length,
)
tokenizer.bos_token = None

prompts = [config.prompt]
if config.use_chat_template:
    messages = [{"role": "user", "content": config.prompt}]
    input_with_chat_template = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        add_special_tokens=False,
    )
    prompts = [input_with_chat_template]

max_prompt_length = max(len(tokenizer.encode(p)) for p in prompts)
max_tokens_to_generate = config.max_target_length - max_prompt_length

base_rollout_config = base_rollout.RolloutConfig(
    max_tokens_to_generate=max_tokens_to_generate,
    max_prompt_length=max_prompt_length,
    temperature=config.decode_sampling_temperature,
)
vllm_rollout = VllmRollout(
    model=maxtext_model,
    tokenizer=tokenizer,
    mesh=mesh,
    cache_config_or_size=1280,
    rollout_config=base_rollout.RolloutConfig(
    rollout_vllm_model_version=config.tokenizer_path,
    rollout_vllm_hbm_utilization=0.8,
    rollout_vllm_init_with_random_weights=True,
    rollout_vllm_tpu_backend_type="jax",
    data_type="bfloat16",
    data_parallel_size=8,
    max_tokens_to_generate=max_tokens_to_generate,
    rollout_vllm_additional_config={
        "maxtext_config": {"model_name": config.model_name}
    },
    max_prompt_length=max_prompt_length,
    temperature=config.decode_sampling_temperature,
    ))

base_rollout_config = base_rollout.RolloutConfig(
    max_tokens_to_generate=128,
    max_prompt_length=max_prompt_length,
    temperature=config.decode_sampling_temperature,
)

print(vllm_rollout.generate(prompts, base_rollout_config))

https://paste.googleplex.com/5407327072157696#l=1217

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-04-07T10:32:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Fix qwen2 vllm_mapping for tunix

9e28a26

ChingTsai marked this pull request as ready for review April 7, 2026 10:26

ChingTsai requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, dipannita08, gagika, gobbleturk, hengtaoguo, igorts-git, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners April 7, 2026 10:26

NicoGrande approved these changes Apr 7, 2026

View reviewed changes

abhinavclemson self-requested a review April 7, 2026 18:31

abhinavclemson approved these changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix qwen2 vllm_mapping for tunix#3588

Fix qwen2 vllm_mapping for tunix#3588
ChingTsai wants to merge 1 commit intomainfrom
jimmytsai/fix_qwen2.5_vllm_mapping

ChingTsai commented Apr 7, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ChingTsai commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Apr 7, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChingTsai commented Apr 7, 2026 •

edited

Loading