Training language model on colab takes long

(David Pfahler) #1

In notebook 12c_ulmfit, Jeremy is training the frozen model for 1 epoch and then the unfrozen model for 10 epochs for about 7.5 minutes per epoch. When I try to train the same model on colab, it takes 45minutes per epoch. Does anyone know why that is?

0 Likes

(David Pfahler) #2

I can confirm that it takes the same amount of time on an AWS p2.xlarge instance, so I’m not sure what is going on.

0 Likes

(Norman Secord) #3

My guess is that it is a function of the GPU used. The default GPU on Colab is I believe a K80 and this is the same GPU on an AWS p2.xlarge instance. Probably Jeremy is using a much newer GPU, (Volta?), which would account for the 6x difference. On Colab you do have the option to choose a TPU instance type. Try that to see if things speed up.

0 Likes

(David Pfahler) #4

TPUs won’t work with fastai, right? I guess I underestimated how much GPUs have progressed.

0 Likes

(Norman Secord) #5

Sorry, yes, you are right that TPU support is not there out of the box for Pytorch but it is coming and there is something on the Pytorch forums about how to set it up.

0 Likes

(Max Tian) #6

you can try is to use mixed precision training and see if that speeds up training.

0 Likes

(William Minshew) #7

can spin up a notebook on a gtx 1080 ti with emrys for free with the current promo, if you’re interested in speeding it up

0 Likes