Same data but has large different while evaluating in the training stage vs evaluate it standalone from read the finetuned-model #37265

irmathebest · 2025-04-03T21:36:41Z

Hi team.
I have stuck on this problem for a whole week and still cannot figure out why.
Env: python 3.8, transformer -- 4.28
I am using the XLMRobertA Base for finetuning the model for a multi-class classification.
However,
when in the training step, I run trainer.evaluate() it shows the accuracy is 68% while in the evaluate standalone, which it reads the base model and then make the prediction and evaluate it, the accuracy drops to 30%. Is there any reason why it happens, or it's a bug?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same data but has large different while evaluating in the training stage vs evaluate it standalone from read the finetuned-model #37265

Same data but has large different while evaluating in the training stage vs evaluate it standalone from read the finetuned-model #37265

irmathebest commented Apr 3, 2025

Same data but has large different while evaluating in the training stage vs evaluate it standalone from read the finetuned-model #37265

Same data but has large different while evaluating in the training stage vs evaluate it standalone from read the finetuned-model #37265

Comments

irmathebest commented Apr 3, 2025