Lesson 4 In-Class Discussion


(Nick) #392

I think the part that takes up time is the model builds a vocab field onto TEXT, and this takes a while because the corpus has to be parsed through each time. I looked at the source, and it should skip this step if you pass in a TEXT object that already has the vocab populated, but when I tested this hypothesis it still took forever to build the darn model. So I’m not sure…


(Jeremy Howard) #393

I only just added that check FYI, and it’s not well tested.


(Alex Shenfield) #394

What are the likely GPU memory requirements of using bptt of 70 in the lesson4-imdb notebook? I’m running out of memory on my 8GB gtx 1070 and wondered what the most effective way fo reducing the memory requirements are - reduce bqtt or reduce bs?

Any thoughts?

al3xsh


(Nick) #395

I’m successfully using bs=32 and bqtt=70 on a 8GB 1070


(Alex Shenfield) #396

I get:

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/generic/THCStorage.cu:58

when training the model with learner.fit(3e-3, 4, wds=1e-6, cycle_len=1, cycle_mult=2).

However, the graphics card is rendering X as well as training the LanguageModel so that means some of the memory (approx 350-450MiB) is taken up with that too. Having tracked memory usage through nvidia-smi, there is not much headroom on the 8GB of memory.

I think part of the problem might also be that, as Jeremy discusses in the video, the bptt = 70 parameter is not 100% fixed so the batch size can vary somewhat.

I’ll try using bptt of 65 and seeing if that improves things …

Cheers,

al3xsh


(David Bressler) #397

Are there any resources for explaining when/how to use signal values for missing values for deep learning models?


(Jeremy Howard) #398

Yes the machine learning course covers those ideas reasonably well.


(David Bressler) #399

Ah great… Looking forward to going through those materials as well… Thanks!


#401

Hi!
I get a similar error:

RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/THCGeneral.c:70

even though I use only bptt=50 and have made a smaller dataset (200 files in the training set and 100 files in the testing set). Could you please tell me if you solved your problem? I use the conda environment which was provided by the course in January and my version of pytorch in this environment is 0.3.0.

And if anyone else has a suggestion, I would be glad to read it.

Thanks in advance.

EDIT: I turned down my computer and started again, and this time the error didn’t occur. I imagine that my GPU (because I run things locally) had reached its limit but that when I rebooted my computer the GPU was fresh again, and having less applications to run at the same time, it could perform the task. This being said, these issues of GPU capacity are way above my head. Does anyone has a simple tutorial to learn how to manage these issues?