Lesson 4 In-Class Discussion ✅

I feel it depends on which part of the range is most important, whether you normalise and how your true distribution looks like. A sigmoid is quite linear in the ‘normally interesting’ middle part of the range.

Can we use the WT103 pre-trained model for Non English languages which have different grammar rules than English?

In case of Sinhala (language of Sri Lanka), we normally use Sinhala and mix it with some English. English words are usually nouns.

I think in this case, I need to build a proper language model first of all.
But I’ll try with WT103 and see how it goes.

I highly recommend using pdb.set_trace() and just stepping through the layers of a model and looking at the dimensions change.


The learned entity embeddings for categorical variables can be used in subsequent tabular-data models. So, that’s one way of using transfer learning for tabular data.

1 Like

Thanks @jeremy, @rachel and all at Fast.ai for another great lesson! The Excel spreadsheet was a great way of visualizing and and getting an understanding what is going on under the hood. Together with going through the layers of a calculations of parameters and activations in the last diagram Jeremy drew. Peeling away the layers like this gives such great insights and builds an intuition that just hasn’t been available to me previously. It all meshes nicely with the previous lessons as we get into a deeper understanding on our way back.
So grateful for you all making this accessible and explaining all the moving parts in such an understandable and succinct way.

Also picked up some neat insights into the power of Excel!
And of course a happy birthday to Jeremy!


I think Jeremy mentioned about Model Zoo in this context.


Thanks for the link.
Pretty useful.


Yes, That kind of sums up my feeling as well. And that is I suppose part of the problem, it’s a feeling. Really one wants to have a ground truth to compare against. To see how much the sigmoid skews the results. But I’m actually fine with it for the time being.

I’m running on a server with 1080Ti (11GB) cards, and I keep running out of memory half way through training. I don’t quite understand what’s going on because no matter what batch size I use it seems to bomb out half way though with the following error. When monitoring the VRAM it sits at about 10GB but then explodes half way through. I wonder if it’s something to do with gradual un-freezing.

CUDA out of memory. Tried to allocate 1.13 GiB (GPU 0; 10.92 GiB total capacity; 4.55 GiB already allocated; 718.19 MiB free; 4.12 GiB cached)

1 Like

I have couple of questions regarding the NLP usages.

  1. How do we handle the multiple classes and is there any way to handle imbalanced in data.
  2. Is there a way to switch off the unique code generation and retain those words.

Can anyone explain, what does it mean by " We have to use a special kind of TextDataBunch for the language model, that ignores the labels (that’s why we put 0 everywhere)". What labels are being mentioned here? Labels are negative and positive reviews for classification. In first language model, the field is left empty.

1 Like

Just came across this while reading through the chat:

I have stated something similar in the dev chat while waiting for the feed to return this morning, maybe you could expand on how it its better and more flexible and maybe give some hints on transitioning “old” code (well its a week old, data_block was only introduced to us in the last lesson and could still handle Datasets then…)

1 Like

During transfer learning in Image classification. We are freezing all the initial layers and chopping off the existing last few layers and replacing them with new layers based on our number of classes and then training only those newly added layers.

what is the role of freezing the layers in the language model in case of transfer learning? what are the last few layers which we are chopping and then replacing them with new layers which we are initially training?bcoz for the encoder part`we are only doing unsupervised learning i.e. there are no labels?So what layers are we initially training?

1 Like

The idea here is that in the language model part we are saying here’s a load of movie review text to fine-tune the language model that was built on wikipedia text. It is our ‘domain’ text. Whether the review is positive or negative is irrelevant. In the same way, we can use the text from the test set. We are just trying to get as big a corpus of domain language as we can to fine-tune the model.

The labels are only needed later on when we are using the language to predict if the review is positive or negative.

I was able to create a Language Model and was able to predict sentences, i have a CSV file which has 3 columns Text is in the 2nd column and my Label is in the 3rd column. When i am trying to build a classifier using the Language Model i built, i am not able to build a data bunch from my CSV file similar to IMDB classifier from folder which was discussed in today’s class. Can some one help me on this.


Try this: the labels should come first; last column should be the text

I am getting TypeError: split_from_df() got an unexpected keyword argument ‘cols’ when running imdb nodebook command

data = (TextList.from_csv(path, ‘texts.csv’, col=‘text’)

Can somebody help?

If I’m not mistaken when asked the question about the magic number 2.6**4 in one of the learning rates, Jeremy explained the 2.6 but said the **4 would be explained later in lesson 4. Did I miss it ? Why is there a to the fourth?

1 Like

i removed “cols=” and just left the number in there to get past the error