Should I try to fix the situation that recorder's validation accuracy can be different than `get_preds()`'s


When trying to reproduce ULMFiT paper’s result with fastai v1.0.57 on IMDb, AG News, and TREC-6, I noticed that only TREC-6 gets different accuracies between the one reported during training by the Recorder callback and the get_preds()'s afterwards.

I’m aware of similar topics on the forum such as

Assuming there’s no mistakes about drop_last with sampling, and the difference is almost always from arithmetics with mini-batches, will it be safe to ignore the situation and just use get_preds()'s numbers? Is it inevitable when using mixed precision on a small dataset like TREC-6?

For reference, numbers I got are listed in:

Thank you.