Lesson 4 In-Class Discussion

zengid · January 11, 2018, 8:52am

I think the part that takes up time is the model builds a vocab field onto TEXT, and this takes a while because the corpus has to be parsed through each time. I looked at the source, and it should skip this step if you pass in a TEXT object that already has the vocab populated, but when I tested this hypothesis it still took forever to build the darn model. So I’m not sure…

jeremy · January 12, 2018, 12:42am

I only just added that check FYI, and it’s not well tested.

al3xsh · January 12, 2018, 4:11pm

What are the likely GPU memory requirements of using bptt of 70 in the lesson4-imdb notebook? I’m running out of memory on my 8GB gtx 1070 and wondered what the most effective way fo reducing the memory requirements are - reduce bqtt or reduce bs?

Any thoughts?

al3xsh

nickl · January 15, 2018, 11:21pm

I’m successfully using bs=32 and bqtt=70 on a 8GB 1070

al3xsh · January 16, 2018, 10:08am

I get:

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/generic/THCStorage.cu:58

when training the model with learner.fit(3e-3, 4, wds=1e-6, cycle_len=1, cycle_mult=2).

However, the graphics card is rendering X as well as training the LanguageModel so that means some of the memory (approx 350-450MiB) is taken up with that too. Having tracked memory usage through nvidia-smi, there is not much headroom on the 8GB of memory.

I think part of the problem might also be that, as Jeremy discusses in the video, the bptt = 70 parameter is not 100% fixed so the batch size can vary somewhat.

I’ll try using bptt of 65 and seeing if that improves things …

Cheers,

al3xsh

superexistential · January 31, 2018, 11:54pm

Are there any resources for explaining when/how to use signal values for missing values for deep learning models?

jeremy · February 1, 2018, 1:59am

Yes the machine learning course covers those ideas reasonably well.

superexistential · February 1, 2018, 3:06am

Ah great… Looking forward to going through those materials as well… Thanks!

abercher · June 14, 2018, 11:34am

Hi!
I get a similar error:

RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/THCGeneral.c:70

even though I use only bptt=50 and have made a smaller dataset (200 files in the training set and 100 files in the testing set). Could you please tell me if you solved your problem? I use the conda environment which was provided by the course in January and my version of pytorch in this environment is 0.3.0.

And if anyone else has a suggestion, I would be glad to read it.

Thanks in advance.

EDIT: I turned down my computer and started again, and this time the error didn’t occur. I imagine that my GPU (because I run things locally) had reached its limit but that when I rebooted my computer the GPU was fresh again, and having less applications to run at the same time, it could perform the task. This being said, these issues of GPU capacity are way above my head. Does anyone has a simple tutorial to learn how to manage these issues?

utkb · September 2, 2018, 7:33am

At a later part of the notebook, for the commands:
IMDB_LABEL = data.Field(sequential=False)
splits = torchtext.datasets.IMDB.splits(TEXT, IMDB_LABEL, ‘data/’)
it seems to want to download a aclImdb_v1.tar.gz.

I just renamed the aclImdb.tgz to aclImdb_v1.tar.gz within the ./data/ folder, and it seems to work fine, without having to re-download anything.

Note that when I googled aclImdb_v1.tar.gz, I found this file, which does not seem to be the right file to use…! Maybe it is just a different / outdated file for a previous version of the example? This Stanford file was breaking the following commands:
t = splits[0].examples[0]
t.label, ’ '.join(t.text[:16])

Thanks.

MohamedELshazly · October 13, 2018, 1:19pm

I’m getting an error in kaggle kernels and colab saying that there’s no module named fastai.learner, and the error is also triggered by those lines :

from fastai.rnn_reg import *
from fastai.rnn_train import *
from fastai.nlp import *
from fastai.lm_rnn import *

And since you can’t add a custom package in kaggle kernels while the GPU is on, I tried installing the fast ai package from the github link in colab, but to no avail.
Any help would be appreciated!!