Degraded Results After Retraining

Hey the team,

Thanks for the great work. I have a problem about the SFT training: I found that the model performance after running the official SFT training seems to be lower than the reported number as shown the table below. Though I'm temporarily using `Qwen3-VL-8B-Instruct` as the judge, there's still a huge gap between the release model and the re-trained model, especially on large datasets like MathVista, MathVision, MathVerse, and MMMU-Pro. Some of them are outside the resulting stderr from the lmms-eval, making me feel concerning.

<img width="1204" height="136" alt="Image" src="https://github.com/user-attachments/assets/343f63ff-273b-433e-a631-4dbd66824d73" />

I haven't run the RL and compared with their downstream results yet but just wanna know if that's normal or not. Looking forward to your guidance. Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Degraded Results After Retraining #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Degraded Results After Retraining #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions