ULMFit training time


(Ben Johnson) #1

Hi All –

Has anyone run the ULMFit training scripts in courses/dl2/imdb_scripts or courses/dl2/imdb.py? If so, roughly how long did

  • finetuning the language model
  • training the classifier

take? I’m running on a Pascal TITAN X, and the finetuning step is taking a long time (~5 hours or so) – so just wanted to see if there’s anything obviously misconfigured on my machine.

~ Ben


(urmas pitsi) #2

I’ve got 95.1% acc using single 1080ti, I didn’t time it but I guess it was inbetween 2-3 hours. I skipped learning rate finders though.
I made some modifications right away to speed up calculations:
max_vocab = 30000 #60000
language model: bs=128 #52
classifier: bs = 64 #48


(Ben Johnson) #3

Do you have a run script or a fork I could look at to reproduce those results? I get a number of errors when I run the imdb_scripts code – bad paths, missing arguments, etc.


(urmas pitsi) #4

I just run the jupyter notebook cells, haven’t looked into scripts at all.


(Ben Johnson) #5

Ah alright – thanks. Seems like the parameters (number of epochs, dropout, etc) in the notebook and the notebook are substantially different.

Is this the version you ran?


(urmas pitsi) #6

yes, should be it. Actually staring at it right now :slight_smile:


(Ben Johnson) #7

Cells 12/13 and 16/17 sortof conflict with each other – which values of dps and lrs did you use?


(urmas pitsi) #8

good point, I used these:
dropout:
dps = np.array([0.4, 0.5, 0.05, 0.3, 0.1])

lrs:
lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm4), lr/(lrm3), lr/(lrm**2), lr/lrm, lr])

num_workers = 8 (instead of 1, I have 4 cores / 8 cpus)


(Ben Johnson) #9

Thanks. And wd?


(urmas pitsi) #10

:slight_smile: that is even better! i ran this:
wd = 1e-7
wd = 0
learn.load_encoder(‘lm1_enc’) # (‘lm2_enc’)

guess i was using wd=0 after all :slight_smile: didn’t notice that before


(Ben Johnson) #11

OK thanks – looks like I’m getting the results in the paper if I use

  • dps = np.array([0.4, 0.5, 0.05, 0.3, 0.1])
  • lrs = np.array([lr/(lrm4), lr/(lrm3), lr/(lrm**2), lr/lrm, lr])
  • wd = 0

I’ll post a link to a clean notebook once it’s done running


(Daniel Armstrong) #12

I would love to see the notebook, if you still have it.


(Charles) #14

Hey! I’m trying to work through the imdb.ipyn right now and would love any feedback or pointers.

I have set up an account at Google Cloud, and am running an NVIDIA k80 GPU with them.

Right now I’m running the first cell where we are fitting the model:
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

and it’s only going at about 1 iter/sec, and my volatile-gpu-util is almost at 100%. Would anyone know if this is literally how much the GPU can handle, or if there’s anything I can do to hurry up things along?

Thanks! :slight_smile:


(William Collins) #15

Hi! I just discovered this library and jumped right in (I intend to take the class later when I have some time). Given this fact, and the fact that I used the AWS DeepLearning AMI rather than the fastai AMI may mean that I have missed some system setup which would improve the performance I’m seeing. At this time, training on a corpus of around 250 million tokens, it takes 8.5 hours per epoch.

I would love to know if this is sounds reasonable, and if not, what I could do to improve it.

Below are some relevant details.

I’m using the ULMFiT model that was presented in the iMDB notebook (an AWD LSTM model pretrained on wiki103, bs=52, bptt=70, embedding dim=400, hid size=1150, 3 layers).

I’m training in a Jupyter notebook on AWS EC2 instance using the DeepLearning Ubuntu AMI on 1 p2.xlarge instance (which has NVIDIA K80 GPU, 4 vCPU, 61 GiB RAM).

My vocab size is 50,000.
My pre-tokenized corpus consists of 247,289,534 tokens. No cleaning/tokenization is done during training.
There are 67,935 iterations/batches in an epoch.
Each epoch = 8.5105 hours.
Each iteration takes 0.45 sec.

Thanks!

  • Bill