Lesson 4 In-Class Discussion

It’s exactly the same intuition as using a pre-trained imagenet model to improve an image classifier. Word vectors, however, are the result of just the first layer of a model (effectively).

1 Like

Thanks Jeremy. Am currently working on negative reviews classifier problem without having a large dataset to train with.

Going by that intuition, instead of pre-training a “predict the next word” model and later connecting a classifier to it - would it better if I were to build a text classifier based on a big dataset (like ones from Project Detox, since the text and problem is more relevant), and connecting that to another text classifier (which will be trained on my small dataset)?

Yes that might be better - or better still, pre-train a language model, then fine tune that on Project Detox et al, then fine tune on your dataset.

1 Like

Just noting that the ACL IBDM file linked in the notebook (http://files.fast.ai/data/aclImdb.tgz) isn’t actually a .tgz file - it’s just a straight .tar

I noticed this too. I used the command

$ tar -xvf aclImdb.tgz

to unzip it in case anyone has trouble.

You’ll also need to create a “models” sub directory in the data/aclImdb/ directory

What’s the best way to save and reload the prepared and tokenized LanguageModelData (because it’s pretty slow)?

I see:

pickle.dump(TEXT, open(f'{PATH}models/TEXT.pkl','wb'))

but I can’t see how to reload that into the LanguageModelData, My first thought was to pickle LanguageModelData itself, but that is a generator.

load_model = pickle.load(open(filename, 'rb'))

No, that doesn’t work:

My first thought was to pickle LanguageModelData itself, but that is a generator.

To expand on that a bit - you can’t pickle generators, unless I’m missing something:

pickle.dump(md, open(f'{PATH}models/lang_model.pkl','wb'))

gives:

TypeError: can't pickle generator objects

I think the part that takes up time is the model builds a vocab field onto TEXT, and this takes a while because the corpus has to be parsed through each time. I looked at the source, and it should skip this step if you pass in a TEXT object that already has the vocab populated, but when I tested this hypothesis it still took forever to build the darn model. So I’m not sure…

I only just added that check FYI, and it’s not well tested.

What are the likely GPU memory requirements of using bptt of 70 in the lesson4-imdb notebook? I’m running out of memory on my 8GB gtx 1070 and wondered what the most effective way fo reducing the memory requirements are - reduce bqtt or reduce bs?

Any thoughts?

al3xsh

I’m successfully using bs=32 and bqtt=70 on a 8GB 1070

I get:

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/generic/THCStorage.cu:58

when training the model with learner.fit(3e-3, 4, wds=1e-6, cycle_len=1, cycle_mult=2).

However, the graphics card is rendering X as well as training the LanguageModel so that means some of the memory (approx 350-450MiB) is taken up with that too. Having tracked memory usage through nvidia-smi, there is not much headroom on the 8GB of memory.

I think part of the problem might also be that, as Jeremy discusses in the video, the bptt = 70 parameter is not 100% fixed so the batch size can vary somewhat.

I’ll try using bptt of 65 and seeing if that improves things …

Cheers,

al3xsh

1 Like

Are there any resources for explaining when/how to use signal values for missing values for deep learning models?

Yes the machine learning course covers those ideas reasonably well.

1 Like

Ah great… Looking forward to going through those materials as well… Thanks!

Hi!
I get a similar error:

RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1512387374934/work/torch/lib/THC/THCGeneral.c:70

even though I use only bptt=50 and have made a smaller dataset (200 files in the training set and 100 files in the testing set). Could you please tell me if you solved your problem? I use the conda environment which was provided by the course in January and my version of pytorch in this environment is 0.3.0.

And if anyone else has a suggestion, I would be glad to read it.

Thanks in advance.

EDIT: I turned down my computer and started again, and this time the error didn’t occur. I imagine that my GPU (because I run things locally) had reached its limit but that when I rebooted my computer the GPU was fresh again, and having less applications to run at the same time, it could perform the task. This being said, these issues of GPU capacity are way above my head. Does anyone has a simple tutorial to learn how to manage these issues?

1 Like

At a later part of the notebook, for the commands:
IMDB_LABEL = data.Field(sequential=False)
splits = torchtext.datasets.IMDB.splits(TEXT, IMDB_LABEL, ‘data/’)
it seems to want to download a aclImdb_v1.tar.gz.

I just renamed the aclImdb.tgz to aclImdb_v1.tar.gz within the ./data/ folder, and it seems to work fine, without having to re-download anything.

Note that when I googled aclImdb_v1.tar.gz, I found this file, which does not seem to be the right file to use…! Maybe it is just a different / outdated file for a previous version of the example? This Stanford file was breaking the following commands:
t = splits[0].examples[0]
t.label, ’ '.join(t.text[:16])

Thanks.

I’m getting an error in kaggle kernels and colab saying that there’s no module named fastai.learner, and the error is also triggered by those lines :

from fastai.rnn_reg import *
from fastai.rnn_train import *
from fastai.nlp import *
from fastai.lm_rnn import *

And since you can’t add a custom package in kaggle kernels while the GPU is on, I tried installing the fast ai package from the github link in colab, but to no avail.
Any help would be appreciated!!