ULMFit training time


(Ben Johnson) #1

Hi All –

Has anyone run the ULMFit training scripts in courses/dl2/imdb_scripts or courses/dl2/imdb.py? If so, roughly how long did

  • finetuning the language model
  • training the classifier

take? I’m running on a Pascal TITAN X, and the finetuning step is taking a long time (~5 hours or so) – so just wanted to see if there’s anything obviously misconfigured on my machine.

~ Ben


(urmas pitsi) #2

I’ve got 95.1% acc using single 1080ti, I didn’t time it but I guess it was inbetween 2-3 hours. I skipped learning rate finders though.
I made some modifications right away to speed up calculations:
max_vocab = 30000 #60000
language model: bs=128 #52
classifier: bs = 64 #48


(Ben Johnson) #3

Do you have a run script or a fork I could look at to reproduce those results? I get a number of errors when I run the imdb_scripts code – bad paths, missing arguments, etc.


(urmas pitsi) #4

I just run the jupyter notebook cells, haven’t looked into scripts at all.


(Ben Johnson) #5

Ah alright – thanks. Seems like the parameters (number of epochs, dropout, etc) in the notebook and the notebook are substantially different.

Is this the version you ran?


(urmas pitsi) #6

yes, should be it. Actually staring at it right now :slight_smile:


(Ben Johnson) #7

Cells 12/13 and 16/17 sortof conflict with each other – which values of dps and lrs did you use?


(urmas pitsi) #8

good point, I used these:
dropout:
dps = np.array([0.4, 0.5, 0.05, 0.3, 0.1])

lrs:
lr=3e-3
lrm = 2.6
lrs = np.array([lr/(lrm4), lr/(lrm3), lr/(lrm**2), lr/lrm, lr])

num_workers = 8 (instead of 1, I have 4 cores / 8 cpus)


(Ben Johnson) #9

Thanks. And wd?


(urmas pitsi) #10

:slight_smile: that is even better! i ran this:
wd = 1e-7
wd = 0
learn.load_encoder(‘lm1_enc’) # (‘lm2_enc’)

guess i was using wd=0 after all :slight_smile: didn’t notice that before


(Ben Johnson) #11

OK thanks – looks like I’m getting the results in the paper if I use

  • dps = np.array([0.4, 0.5, 0.05, 0.3, 0.1])
  • lrs = np.array([lr/(lrm4), lr/(lrm3), lr/(lrm**2), lr/lrm, lr])
  • wd = 0

I’ll post a link to a clean notebook once it’s done running