add no colocate update critic only #1567

lilei199908 · 2026-02-10T14:53:08Z

No description provided.

Copilot

Pull request overview

Adds a --critic-train-only mode intended to support running critic training without full actor/critic coupling, and adjusts a few initialization/sync paths to account for this mode.

Changes:

Add --critic-train-only CLI flag.
Change Megatron args.world_size computation when critic_train_only is enabled.
Skip actor↔critic connection and adjust rollout manager wiring / data sync behavior in critic-only mode.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`slime/utils/arguments.py`	Introduces `--critic-train-only` and changes how `world_size` is set for Megatron.
`slime/ray/placement_group.py`	Adjusts training model initialization to avoid actor↔critic coupling when critic-only.
`slime/backends/megatron_utils/actor.py`	Avoids actor↔critic data sync during critic training when critic-only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T15:01:53Z

slime/utils/arguments.py

+        if args.critic_train_only:
+            args.world_size = args.critic_num_nodes * args.critic_num_gpus_per_node
+        else:
+            args.world_size = args.actor_num_nodes * args.actor_num_gpus_per_node


args.world_size is used to validate/derive Megatron parallelism defaults, but in critic_train_only mode it’s set from critic node/GPU counts even though the codebase still initializes an actor RayTrainGroup (with potentially different actor node/GPU counts). This can lead to incompatible TP/PP/DP settings and runtime failures when the actor group initializes. Consider keeping args.world_size based on the actor group (status quo) unless you also skip actor initialization in critic_train_only, or add validation that actor/critic world sizes must match when this flag is set (or introduce separate actor/critic world-size configs).

Suggested change

if args.critic_train_only:

args.world_size = args.critic_num_nodes * args.critic_num_gpus_per_node

else:

args.world_size = args.actor_num_nodes * args.actor_num_gpus_per_node

actor_world_size = args.actor_num_nodes * args.actor_num_gpus_per_node

# NOTE:

# Megatron parallelism defaults and validations are derived from args.world_size,

# and the training actor RayTrainGroup is also defined based on the actor world size.

# To avoid mismatches, always keep args.world_size aligned with the actor group.

if getattr(args, "critic_train_only", False):

critic_num_nodes = getattr(args, "critic_num_nodes", None)

critic_num_gpus_per_node = getattr(args, "critic_num_gpus_per_node", None)

if critic_num_nodes is not None and critic_num_gpus_per_node is not None:

critic_world_size = critic_num_nodes * critic_num_gpus_per_node

if critic_world_size != actor_world_size:

raise ValueError(

f"In critic_train_only mode, actor and critic world sizes must match, "

f"but got actor_world_size={actor_world_size} "

f"and critic_world_size={critic_world_size}."

)

args.world_size = actor_world_size

Copilot · 2026-02-10T15:01:54Z

slime/ray/placement_group.py

    actor_model.set_rollout_manager(rollout_manager)
+    if args.critic_train_only:
+        critic_model.set_rollout_manager(rollout_manager)


When --critic-train-only is set but --use-critic is not, critic_model is None and this will raise an AttributeError. Either validate that critic_train_only implies use_critic, or guard this block with if args.use_critic (and provide a clear error message if not).

we can always run

critic_model.set_rollout_manager(rollout_manager)

maybe change if args.critic_train_only to if use_critic and remove the critic.set_rollout_manager in actor_model.set_rollout_manager

zhuzilin · 2026-02-10T15:35:41Z

slime/ray/placement_group.py

        )
    )
+    if not args.critic_train_only:
+        assert len(set(start_rollout_ids)) == 1


the critic training should also load the start_rollout_ids.

zhuzilin · 2026-02-10T15:37:02Z

slime/ray/placement_group.py

    actor_model.set_rollout_manager(rollout_manager)
+    if args.critic_train_only:
+        critic_model.set_rollout_manager(rollout_manager)


we can always run

critic_model.set_rollout_manager(rollout_manager)

maybe change if args.critic_train_only to if use_critic and remove the critic.set_rollout_manager in actor_model.set_rollout_manager

add no colocate update critic only

1ad7554

Copilot AI review requested due to automatic review settings February 10, 2026 14:53

Copilot started reviewing on behalf of lilei199908 February 10, 2026 14:55 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

zhuzilin reviewed Feb 10, 2026

View reviewed changes

zhuzilin added the run-ci-megatron label Feb 10, 2026

add ci for ppo train critic only and fix

b2c3b8a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add no colocate update critic only #1567

add no colocate update critic only #1567

Uh oh!

lilei199908 commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

zhuzilin Feb 10, 2026

Uh oh!

zhuzilin Feb 10, 2026

Uh oh!

zhuzilin Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        if args.critic_train_only:
-            args.world_size = args.critic_num_nodes * args.critic_num_gpus_per_node
-        else:
-            args.world_size = args.actor_num_nodes * args.actor_num_gpus_per_node
+        actor_world_size = args.actor_num_nodes * args.actor_num_gpus_per_node
+        # NOTE:
+        # Megatron parallelism defaults and validations are derived from args.world_size,
+        # and the training actor RayTrainGroup is also defined based on the actor world size.
+        # To avoid mismatches, always keep args.world_size aligned with the actor group.
+        if getattr(args, "critic_train_only", False):
+            critic_num_nodes = getattr(args, "critic_num_nodes", None)
+            critic_num_gpus_per_node = getattr(args, "critic_num_gpus_per_node", None)
+            if critic_num_nodes is not None and critic_num_gpus_per_node is not None:
+                critic_world_size = critic_num_nodes * critic_num_gpus_per_node
+                if critic_world_size != actor_world_size:
+                    raise ValueError(
+                        f"In critic_train_only mode, actor and critic world sizes must match, "
+                        f"but got actor_world_size={actor_world_size} "
+                        f"and critic_world_size={critic_world_size}."
+                    )
+        args.world_size = actor_world_size

add no colocate update critic only #1567

Are you sure you want to change the base?

add no colocate update critic only #1567

Uh oh!

Conversation

lilei199908 commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

zhuzilin Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

zhuzilin Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

zhuzilin Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants