Skip to content

Degraded Results After Retraining #8

@tsunghan-wu

Description

@tsunghan-wu

Hey the team,

Thanks for the great work. I have a problem about the SFT training: I found that the model performance after running the official SFT training seems to be lower than the reported number as shown the table below. Though I'm temporarily using Qwen3-VL-8B-Instruct as the judge, there's still a huge gap between the release model and the re-trained model, especially on large datasets like MathVista, MathVision, MathVerse, and MMMU-Pro. Some of them are outside the resulting stderr from the lmms-eval, making me feel concerning.

Image

I haven't run the RL and compared with their downstream results yet but just wanna know if that's normal or not. Looking forward to your guidance. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions