30GB Too Slow! Fine Tuning IMDB language model. 10 NLP

daveramseymusic · August 4, 2021, 3:31pm

In Chapter 10 NLP “Fine Tuning the Language Model” it shows Jeremy’s learn.fit_one_cycle taking 11minutes.

Mine takes 1hr 45 minutes and I had to downsize the ‘batch size’ = 64 and ‘sequence length’ =40 just to avoid getting CUDA OOM error.

I’m using a paperspace 30G GPU.

I sense that something is wrong. Does anyone know why it’s taking so long or where I could look for tips on speeding it up?

any help at all would be much appreciated

Screen Shot 2021-08-04 at 10.29.24 AM

VishnuSubramanian · August 8, 2021, 5:20am

I am not sure if the 30G is GPU memory. It could be probably CPU memory. I tried running the same code on RTX 5000 in jarvislabs.ai, it took approximately 13.46 seconds.

bwarner · August 8, 2021, 5:20am

Since you are getting a CUDA OOM error, it sounds like the model is being trained on the GPU and not via CPU.

I’m not seeing a Paperspace 30G GPU on their instance list, but if it’s one of the cheaper GPUs, perhaps a M4000 or P4000, those both have significantly less RAM and compute power then the Titan RTX which the lesson was ran on.

If that’s the case, you’d have better luck using Colab or Kaggle. Kaggle’s P100s are still a lot slower than the Titan RTX, but should be leaps and bounds better than a M4000 or P4000. Likewise a T4 on the free tier of Colab will be slower but also should be an improvement, especially if you used Mixed Precision.

daveramseymusic · August 9, 2021, 1:44pm

Thanks so much for replying. I might try Kaggle or Colab next.

Currently, it’s showing that I’m using the Free P5000 (30GB, 8CPUs)
Screen Shot 2021-08-09 at 9.38.49 AM

before I was using the M4000.

daveramseymusic · September 15, 2021, 8:21pm

Update!

About 3 weeks ago I splurged and started paying $8 per month to get access to more GPUs

Currently I find the Free-RTX4000 30GPU Ram | 8CPUs to be very fast and almost always available.

In case anyone is looking to upgrade I found this simple and well worth it.

rogerbock · January 18, 2022, 2:33pm

I am running into a similar problem. When I run on a TPU on Google Colab, it looks like the below cell will take hours to complete.

from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

daveramseymusic · January 20, 2022, 3:05pm

Hey Roger. You might try paper space just for one month to see if that RTX4000 runs faster for you. They also have faster GPUs you can rent by the hour for a couple of dollars an hour.

note: I never seemed to get mine to run as quickly as Jeremy’s example, but about twice as fast was super useful.

bwarner · January 20, 2022, 9:05pm

Fastai currently doesn’t support TPUs, so unless you are using a custom callback your code is training on CPU. A GPU instance should train faster (even a K80, I’d imagine).

thelastjedi · June 10, 2022, 12:26pm

I am also using Paperspace Pro and running on RTX 5000, but training time is still ~1hr. What is your training time?