In notebook 12c_ulmfit, Jeremy is training the frozen model for 1 epoch and then the unfrozen model for 10 epochs for about 7.5 minutes per epoch. When I try to train the same model on colab, it takes 45minutes per epoch. Does anyone know why that is?
I can confirm that it takes the same amount of time on an AWS p2.xlarge instance, so I’m not sure what is going on.
My guess is that it is a function of the GPU used. The default GPU on Colab is I believe a K80 and this is the same GPU on an AWS p2.xlarge instance. Probably Jeremy is using a much newer GPU, (Volta?), which would account for the 6x difference. On Colab you do have the option to choose a TPU instance type. Try that to see if things speed up.
TPUs won’t work with fastai, right? I guess I underestimated how much GPUs have progressed.
Sorry, yes, you are right that TPU support is not there out of the box for Pytorch but it is coming and there is something on the Pytorch forums about how to set it up.
you can try is to use mixed precision training and see if that speeds up training.