DTensor path does not support Qwen3-VL GRPO

**Describe the bug**

DTensor path does not support Qwen3-VL models, like Qwen/Qwen3-VL-2B-Instruct.

**Steps/Code to reproduce bug**

```bash
uv run examples/run_vlm_grpo.py policy.model_name=Qwen/Qwen3-VL-2B-Instruct
```

**Expected behavior**

A clear and concise description of what you expected to happen.


**Additional context**

Traceback:

```
Traceback (most recent call last):
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/examples/run_vlm_grpo.py", line 392, in <module>
    main()
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/examples/run_vlm_grpo.py", line 372, in main
    ) = setup(config, tokenizer, dataset, val_dataset, processor=processor)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/nemo_rl/algorithms/grpo.py", line 575, in setup
    policy.print_node_ip_and_gpu_id()
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/nemo_rl/models/policy/lm_policy.py", line 858, in print_node_ip_and_gpu_id
    results = ray.get(
              ^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/worker.py", line 2882, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/ray/_private/worker.py", line 970, in get_objects
    raise value
ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::lm_policy-0-0:DTensorPolicyWorkerV2.__init__() (pid=575996, ip=10.65.25.217, actor_id=af1b7d261b8dc3c83b2075fc01000000, repr=DTensorPolicyWorkerV2[rank=0])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_nemorl/users/alexq/RL-main/nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py", line 296, in __init__
    self.model = model_class.from_pretrained(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Automodel-workspace/Automodel/nemo_automodel/_transformers/auto_model.py", line 424, in from_pretrained
    raise e
  File "/opt/nemo-rl/3rdparty/Automodel-workspace/Automodel/nemo_automodel/_transformers/auto_model.py", line 410, in from_pretrained
    model = cls._from_pretrained_parent_class(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Automodel-workspace/Automodel/nemo_automodel/_transformers/auto_model.py", line 288, in _from_pretrained_parent_class
    model = super().from_pretrained(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ray_venvs/nemo_rl.models.policy.workers.dtensor_policy_worker_v2.DTensorPolicyWorkerV2/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.qwen3_vl.configuration_qwen3_vl.Qwen3VLConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of ApertusConfig, ArceeConfig, AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, BltConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV2Config, DeepseekV3Config, DiffLlamaConfig, DogeConfig, Dots1Config, ElectraConfig, Emu3Config, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FlexOlmoConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nTextConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, HeliumConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, JambaConfig, JetMoeConfig, Lfm2Config, LlamaConfig, Llama4Config, Llama4TextConfig, LongcatFlashConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MiniMaxConfig, MinistralConfig, MistralConfig, MixtralConfig, MllamaConfig, ModernBertDecoderConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeedOssConfig, SmolLM3Config, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, VaultGemmaConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, ZambaConfig, Zamba2Config, Ministral3Config, Mistral3Config.
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DTensor path does not support Qwen3-VL GRPO #1699

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DTensor path does not support Qwen3-VL GRPO #1699

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions