I'm trying to train this model on ImageNet, but the loss seems to converge slowly after 60k iterations and the loss value is approximately 2.5.
Do you have lower loss value or the similar phenomena above?
I just want to check that I train this model correctly.
Thanks.