I’m having trouble reproducing Jeremy’s results of ~0.93 accuracy on dog breeds dataset.
I tried reproducing his code, as well as tinkering around, and I can’t get past 0.90 accuracy/0.35 loss, which translates to around 0.2-0.3 kaggle score, places 300-500.
My steps are:
use 224x224 data
precompute=True
lr_find()
train for 3 epochs (overfits after 3)
precompute=False
train for 3 epochs with cycle_len=1 (overfits after 3)
unfreeze
lr_find() - here the plot looks really weird - first flat, then goes up fast
train for 3 epochs with cycle_len=1, cycle_mult=2, and differential lr: here it clearly overfits
set 299x299 data
train for 3 epochs with cycle_len=1
Am I doing anything wrong, or I just need to add more sophistication and trial-and-error?
Perhaps running it and comparing results to your notebook could help isolate the issue.
In lesson1-breeds.ipynb the cached output after the first training ( precompute=True ) does not overfit after 3 epochs ( epoch == 2 --> 0.297677 0.270401 0.922701). But after epoch == 3 the trn_loss is slightly less than that val_loss.
When I run it on my machine the outputs have the same relationships.
But the numbers are slightly different.
You prolly also know this. But I thought I suggest it just in case you had not tried it. If files the tmp directory went wonky, you can always delete them and rerun the notebook. Beware that hard fought weights saved by learn.save() are stored under tmp. I used the following command while running jupyter in an Ubuntu based docker container.