Skip to content

DTensor path does not support Qwen3-VL GRPO #1699

@zpqiu

Description

@zpqiu

Describe the bug

DTensor path does not support Qwen3-VL models, like Qwen/Qwen3-VL-2B-Instruct.

Steps/Code to reproduce bug

uv run examples/run_vlm_grpo.py policy.model_name=Qwen/Qwen3-VL-2B-Instruct

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Traceback:

Traceback (most recent call last):
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/examples/run_vlm_grpo.py", line 392, in <module>
    main()
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/examples/run_vlm_grpo.py", line 372, in main
    ) = setup(config, tokenizer, dataset, val_dataset, processor=processor)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/nemo_rl/algorithms/grpo.py", line 575, in setup
    policy.print_node_ip_and_gpu_id()
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/nemo_rl/models/policy/lm_policy.py", line 858, in print_node_ip_and_gpu_id
    results = ray.get(
              ^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/worker.py", line 2882, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/worker.py", line 970, in get_objects
    raise value
ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::lm_policy-0-0:DTensorPolicyWorkerV2.__init__() (pid=575996, ip=10.65.25.217, actor_id=af1b7d261b8dc3c83b2075fc01000000, repr=DTensorPolicyWorkerV2[rank=0])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py", line 296, in __init__
    self.model = model_class.from_pretrained(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Automodel-workspace/Automodel/nemo_automodel/_transformers/auto_model.py", line 424, in from_pretrained
    raise e
  File "/opt/nemo-rl/3rdparty/Automodel-workspace/Automodel/nemo_automodel/_transformers/auto_model.py", line 410, in from_pretrained
    model = cls._from_pretrained_parent_class(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Automodel-workspace/Automodel/nemo_automodel/_transformers/auto_model.py", line 288, in _from_pretrained_parent_class
    model = super().from_pretrained(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ray_venvs/nemo_rl.models.policy.workers.dtensor_policy_worker_v2.DTensorPolicyWorkerV2/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.qwen3_vl.configuration_qwen3_vl.Qwen3VLConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of ApertusConfig, ArceeConfig, AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, BltConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV2Config, DeepseekV3Config, DiffLlamaConfig, DogeConfig, Dots1Config, ElectraConfig, Emu3Config, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FlexOlmoConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nTextConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, HeliumConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, JambaConfig, JetMoeConfig, Lfm2Config, LlamaConfig, Llama4Config, Llama4TextConfig, LongcatFlashConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MiniMaxConfig, MinistralConfig, MistralConfig, MixtralConfig, MllamaConfig, ModernBertDecoderConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeedOssConfig, SmolLM3Config, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, VaultGemmaConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, ZambaConfig, Zamba2Config, Ministral3Config, Mistral3Config.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions