Hi, when I use Qwen2.5-7B-Instruct as sample policy, which step_token should I choose? I just find that Qwen2.5-7B-Instruct doesn't have "clear" step.