qw3 8B 同时开启mbridge和VPP时报错IndexError: too many indices for tensor of dimension 3

npu下产生此报错，使用的是megatron+vllm，verl和mindspeed均使用最新主线代码，报错信息如下：

Traceback (most recent call last):

File "/verl/verl/workers/megatron_workers.py", line 860, in compute_log_prob
    output, entropys, layers_topk_idx = self.actor.compute_log_prob(data=data, calculate_entropy=True)

File "/verl/verl/utils/profiler/performance.py", line 105, in f
    return self.log(decorated_function, *args, **kwargs)

File "/verl/verl/utils/profiler/performance.py", line 118, in log
    output = func(*args, **kwargs)

File "/verl/verl/workers/actor/megatron_actor.py", line 235, in compute_log_prob
    output = self.forward_backward_batch(...)

File "/verl/verl/workers/actor/megatron_actor.py", line 683, in forward_backward_batch
    losses_reduced = forward_backward_func(...)

File "/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 1155, in forward_backward_pipelining_with_interleaving
    output_tensor = forward_step_helper(k, microbatch_id, checkpoint_activations_microbatch)

File "/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 1004, in forward_step_helper
    output_tensor, num_tokens = forward_step(...)

File "/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 286, in forward_step
    outputs = loss_func(output_tensor)

File "/verl/verl/workers/actor/megatron_actor.py", line 463, in loss_func
    stats = post_process_fn(output, data)

File "/verl/verl/workers/actor/megatron_actor.py", line 213, in compute_logprobs_fn
    log_probs = output["log_probs"][:, -response_length - 1 : -1].contiguous()

IndexError: too many indices for tensor of dimension 3


目前可以得出的结论是，**mbridge和VPP无法同时开启，同时开启即会产生此报错**，请问是否有人尝试过GPU环境下同时开启mbridge和VPP ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qw3 8B 同时开启mbridge和VPP时报错IndexError: too many indices for tensor of dimension 3 #4815

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

qw3 8B 同时开启mbridge和VPP时报错IndexError: too many indices for tensor of dimension 3 #4815

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions