Lesson 4 In-Class Discussion

(Nick) #392

I think the part that takes up time is the model builds a vocab field onto TEXT, and this takes a while because the corpus has to be parsed through each time. I looked at the source, and it should skip this step if you pass in a TEXT object that already has the vocab populated, but when I tested this hypothesis it still took forever to build the darn model. So I’m not sure…

(Jeremy Howard) #393

I only just added that check FYI, and it’s not well tested.

(Alex Shenfield) #394

What are the likely GPU memory requirements of using bptt of 70 in the lesson4-imdb notebook? I’m running out of memory on my 8GB gtx 1070 and wondered what the most effective way fo reducing the memory requirements are - reduce bqtt or reduce bs?

Any thoughts?


(Nick) #395

I’m successfully using bs=32 and bqtt=70 on a 8GB 1070

(Alex Shenfield) #396

I get:

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/generic/THCStorage.cu:58

when training the model with learner.fit(3e-3, 4, wds=1e-6, cycle_len=1, cycle_mult=2).

However, the graphics card is rendering X as well as training the LanguageModel so that means some of the memory (approx 350-450MiB) is taken up with that too. Having tracked memory usage through nvidia-smi, there is not much headroom on the 8GB of memory.

I think part of the problem might also be that, as Jeremy discusses in the video, the bptt = 70 parameter is not 100% fixed so the batch size can vary somewhat.

I’ll try using bptt of 65 and seeing if that improves things …



(David Bressler) #397

Are there any resources for explaining when/how to use signal values for missing values for deep learning models?

(Jeremy Howard) #398

Yes the machine learning course covers those ideas reasonably well.

(David Bressler) #399

Ah great… Looking forward to going through those materials as well… Thanks!