Lesson 4 In-Class Discussion ✅

Thanks for the link.
Pretty useful.


Yes, That kind of sums up my feeling as well. And that is I suppose part of the problem, it’s a feeling. Really one wants to have a ground truth to compare against. To see how much the sigmoid skews the results. But I’m actually fine with it for the time being.

I’m running on a server with 1080Ti (11GB) cards, and I keep running out of memory half way through training. I don’t quite understand what’s going on because no matter what batch size I use it seems to bomb out half way though with the following error. When monitoring the VRAM it sits at about 10GB but then explodes half way through. I wonder if it’s something to do with gradual un-freezing.

CUDA out of memory. Tried to allocate 1.13 GiB (GPU 0; 10.92 GiB total capacity; 4.55 GiB already allocated; 718.19 MiB free; 4.12 GiB cached)

1 Like

I have couple of questions regarding the NLP usages.

  1. How do we handle the multiple classes and is there any way to handle imbalanced in data.
  2. Is there a way to switch off the unique code generation and retain those words.

Can anyone explain, what does it mean by " We have to use a special kind of TextDataBunch for the language model, that ignores the labels (that’s why we put 0 everywhere)". What labels are being mentioned here? Labels are negative and positive reviews for classification. In first language model, the field is left empty.

1 Like

Just came across this while reading through the chat:

I have stated something similar in the dev chat while waiting for the feed to return this morning, maybe you could expand on how it its better and more flexible and maybe give some hints on transitioning “old” code (well its a week old, data_block was only introduced to us in the last lesson and could still handle Datasets then…)

1 Like

During transfer learning in Image classification. We are freezing all the initial layers and chopping off the existing last few layers and replacing them with new layers based on our number of classes and then training only those newly added layers.

what is the role of freezing the layers in the language model in case of transfer learning? what are the last few layers which we are chopping and then replacing them with new layers which we are initially training?bcoz for the encoder part`we are only doing unsupervised learning i.e. there are no labels?So what layers are we initially training?

1 Like

The idea here is that in the language model part we are saying here’s a load of movie review text to fine-tune the language model that was built on wikipedia text. It is our ‘domain’ text. Whether the review is positive or negative is irrelevant. In the same way, we can use the text from the test set. We are just trying to get as big a corpus of domain language as we can to fine-tune the model.

The labels are only needed later on when we are using the language to predict if the review is positive or negative.

I was able to create a Language Model and was able to predict sentences, i have a CSV file which has 3 columns Text is in the 2nd column and my Label is in the 3rd column. When i am trying to build a classifier using the Language Model i built, i am not able to build a data bunch from my CSV file similar to IMDB classifier from folder which was discussed in today’s class. Can some one help me on this.


Try this: the labels should come first; last column should be the text

I am getting TypeError: split_from_df() got an unexpected keyword argument ‘cols’ when running imdb nodebook command

data = (TextList.from_csv(path, ‘texts.csv’, col=‘text’)

Can somebody help?

If I’m not mistaken when asked the question about the magic number 2.6**4 in one of the learning rates, Jeremy explained the 2.6 but said the **4 would be explained later in lesson 4. Did I miss it ? Why is there a to the fourth?

1 Like

i removed “cols=” and just left the number in there to get past the error

This maybe works too

data = (TextList.from_csv(path, 'texts.csv', col='text')

He didn’t explain in the video. may be later he explain.

1 Like

Have you tried to restart notebook kernel after getting the CUDA memory error? This resolved the issue for me. I am able to train IMDB notebook with bs=50 in 1080ti (11GB).

The correct syntax is:

data = (TextList.from_csv(path, 'texts.csv', col='text')

This is for fastai version:

import fastai
[9]: '1.0.24'

I ran into the same problem.
I trained the model up to: data_lm.save('tmp_lm')
Then I reset the kernel and loaded the data skipping the step that saved the tmp_lm. I loaded from: data_lm = TextLMDataBunch.load(path, 'tmp_lm')
And decreased the back propagation through time from 70 to 50.
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.3, bptt=50)
From there on I could run the remainder of the notebook.


@gbecon @seppedl I’m not sure exactly what fixed it, but thanks. Initially restarting the kernel didn’t help. But, I pulled the latest library (went from 1.0.22 to 1.0.24), pulled the latest lesson 3 notebook (had changed a fair bit). Then tried:

  • bs = 50
  • bptt = 50
  • Restarted the kernel after creating data_lm and then started from data_lm = TextLMDataBunch.load(path, ‘tmp_lm’)

Now it seems to be working. It’s a beasty RNN! I’ve done some stuff before that maxed my GPU, but not with bptt at 50, this was at a few hundred!


Spoke a bit too soon. It died again when I started to unfreeze it. Thankfully killing the kernel and loading learn.load(‘fit_head’); seems to be working.

I suspect it’s a PyTorch issue, it seems to have issues leaving stuff in VRAM. I’ve had similar issues like this before. I guess you often get away with it because you’re not maxing it out.