Skip to content

Conversation

@lilei199908
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings February 10, 2026 14:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a --critic-train-only mode intended to support running critic training without full actor/critic coupling, and adjusts a few initialization/sync paths to account for this mode.

Changes:

  • Add --critic-train-only CLI flag.
  • Change Megatron args.world_size computation when critic_train_only is enabled.
  • Skip actor↔critic connection and adjust rollout manager wiring / data sync behavior in critic-only mode.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
slime/utils/arguments.py Introduces --critic-train-only and changes how world_size is set for Megatron.
slime/ray/placement_group.py Adjusts training model initialization to avoid actor↔critic coupling when critic-only.
slime/backends/megatron_utils/actor.py Avoids actor↔critic data sync during critic training when critic-only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1503 to +1506
if args.critic_train_only:
args.world_size = args.critic_num_nodes * args.critic_num_gpus_per_node
else:
args.world_size = args.actor_num_nodes * args.actor_num_gpus_per_node
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.world_size is used to validate/derive Megatron parallelism defaults, but in critic_train_only mode it’s set from critic node/GPU counts even though the codebase still initializes an actor RayTrainGroup (with potentially different actor node/GPU counts). This can lead to incompatible TP/PP/DP settings and runtime failures when the actor group initializes. Consider keeping args.world_size based on the actor group (status quo) unless you also skip actor initialization in critic_train_only, or add validation that actor/critic world sizes must match when this flag is set (or introduce separate actor/critic world-size configs).

Suggested change
if args.critic_train_only:
args.world_size = args.critic_num_nodes * args.critic_num_gpus_per_node
else:
args.world_size = args.actor_num_nodes * args.actor_num_gpus_per_node
actor_world_size = args.actor_num_nodes * args.actor_num_gpus_per_node
# NOTE:
# Megatron parallelism defaults and validations are derived from args.world_size,
# and the training actor RayTrainGroup is also defined based on the actor world size.
# To avoid mismatches, always keep args.world_size aligned with the actor group.
if getattr(args, "critic_train_only", False):
critic_num_nodes = getattr(args, "critic_num_nodes", None)
critic_num_gpus_per_node = getattr(args, "critic_num_gpus_per_node", None)
if critic_num_nodes is not None and critic_num_gpus_per_node is not None:
critic_world_size = critic_num_nodes * critic_num_gpus_per_node
if critic_world_size != actor_world_size:
raise ValueError(
f"In critic_train_only mode, actor and critic world sizes must match, "
f"but got actor_world_size={actor_world_size} "
f"and critic_world_size={critic_world_size}."
)
args.world_size = actor_world_size

Copilot uses AI. Check for mistakes.
Comment on lines 169 to 171
actor_model.set_rollout_manager(rollout_manager)
if args.critic_train_only:
critic_model.set_rollout_manager(rollout_manager)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When --critic-train-only is set but --use-critic is not, critic_model is None and this will raise an AttributeError. Either validate that critic_train_only implies use_critic, or guard this block with if args.use_critic (and provide a clear error message if not).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always run

critic_model.set_rollout_manager(rollout_manager)

maybe change if args.critic_train_only to if use_critic and remove the critic.set_rollout_manager in actor_model.set_rollout_manager

)
)
if not args.critic_train_only:
assert len(set(start_rollout_ids)) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the critic training should also load the start_rollout_ids.

Comment on lines 169 to 171
actor_model.set_rollout_manager(rollout_manager)
if args.critic_train_only:
critic_model.set_rollout_manager(rollout_manager)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always run

critic_model.set_rollout_manager(rollout_manager)

maybe change if args.critic_train_only to if use_critic and remove the critic.set_rollout_manager in actor_model.set_rollout_manager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants