Training language model on colab takes long

davidpfahler · August 7, 2019, 12:04pm

In notebook 12c_ulmfit, Jeremy is training the frozen model for 1 epoch and then the unfrozen model for 10 epochs for about 7.5 minutes per epoch. When I try to train the same model on colab, it takes 45minutes per epoch. Does anyone know why that is?

davidpfahler · August 7, 2019, 3:41pm

I can confirm that it takes the same amount of time on an AWS p2.xlarge instance, so I’m not sure what is going on.

nsecord · August 7, 2019, 4:06pm

My guess is that it is a function of the GPU used. The default GPU on Colab is I believe a K80 and this is the same GPU on an AWS p2.xlarge instance. Probably Jeremy is using a much newer GPU, (Volta?), which would account for the 6x difference. On Colab you do have the option to choose a TPU instance type. Try that to see if things speed up.

davidpfahler · August 7, 2019, 4:08pm

TPUs won’t work with fastai, right? I guess I underestimated how much GPUs have progressed.

nsecord · August 7, 2019, 4:21pm

Sorry, yes, you are right that TPU support is not there out of the box for Pytorch but it is coming and there is something on the Pytorch forums about how to set it up.

maxmatical · August 7, 2019, 6:14pm

you can try is to use mixed precision training and see if that speeds up training.

wminshew · August 9, 2019, 3:41am

can spin up a notebook on a gtx 1080 ti with emrys for free with the current promo, if you’re interested in speeding it up