I am running Jupyter on my home machine, Ubuntu 20, a Geforce 3080 card. I’ve reached chapter 5, but when running through the examples, I get extremely poor results.
For instance, when running the following code:
# From the start of the chapter
pets = DataBlock(...)
dls = pets.dataloaders(...)
# End of the chapter, discriminative learning rates
from fastai.callback.fp16 import *
learn = cnn_learner(dls, resnet50, metrics=error_rate).to_fp16()
learn.fine_tune(6, freeze_epochs=3)
I end up with
epoch train_loss valid_loss error_rate time
0 0.853479 1.446248 0.403924 00:24
1 0.541883 1.482631 0.374831 00:24
2 0.499215 0.940736 0.266576 00:24
epoch train_loss valid_loss error_rate time
0 0.352959 1.294887 0.337618 00:31
1 0.461585 1.218681 0.337618 00:31
2 0.322976 0.738233 0.215156 00:31
3 0.187617 0.961685 0.273342 00:31
4 0.099728 0.731454 0.215156 00:31
5 0.086650 0.680637 0.197564 00:31
An error rate of 0.2 is pretty abysmal compared to the final result 0.05 in the book. Other examples in that chapter are even worse, I start with an error rate of 0.9 and end at 0.8. Is this a library, driver version issue? I’m currently trying to run through the same example on Colab to verify, but it takes hours and has timed out a couple of times already.
Nvidia driver is 460.32.03, CUDA runtime 11.2.152. Installed fast ai libs through conda as per the instructions at docs.fast.ai - “conda install -c fastai -c pytorch -c anaconda fastai gh anaconda”