Yes, I have seen the same. Looking at the bug mentioned in the other thread, it looks maybe like a rounding/precision error - so if things get loaded into memory differently maybe things go ok.
For anyone who finds this thread with a search:
One other thing that took me some time to work out is that if you are loading a model to resume training (or even do inference on large amounts of data), the load_learner
function loads to the CPU by default, so everything is very slow. If you plan to do additional training, set a flag in the load_learner
like this:
load_learner('path/to/file', cpu=False)
This puts it on the GPU. Took me some time to figure out what was going on and how to fix it.
tags for searches: learner is slow, put learner on GPU
FYI, I tried it both ways and your “divide original LR by 5” way worked better.
So I did my homework and I think I understand tta and its rationale
I presume that if you use tta on your test set as the validation of your model, then you would also use tta for inference if the model is put into production?
Yeah otherwise you wouldn’t get similar results in production.
I copied and ran Jeremy’s code twice using the convnext_small_in22k
model and factory method from_folder
. The overall result on the validation set using tta was the same, but each epoch (and the final accuracy and validation loss) were slightly different from their corresponding epochs in the other experiment. I used the same seed for each experiment. (And obviously the same hyperparameters.)
There must be some randomness, but I thought that using the same seed was supposed to eliminate this. What is the source of randomness? Does this have any implications for creating models (eg. you may get slightly different results if you rerun so run more than once)?
(I also ran using the DataBlock
api and got a slightly different answer).
I encountered a problem to run this notebook:
“course22/10-scaling-up-road-to-the-top-part-3.ipynb at master · fastai/course22 · GitHub”
To fix this, I modified the following, which is in the pull request right now. Please take a look at the following pull request. Thanks