A Code-First Introduction to Natural Language Processing 2019

I found in NLP lectures that we are using x[:, i] as input instead of x[i] when we want to predict the next number, though, numbers are arranged in x[i] order. Why are we doing so?lesson7-human-numbers%20ipynb%20-%20Colaboratory

Hi @rachel,

I was going through the chapter on seq2seq translation.
I wanted to request you to upload the final ‘questions_easy.csv’ (which is used to creating the final databunch) to the S3 datalake where fastai has hosted all the other datasets.

The problem is creating the datasets uses a lot of resources. I have tried running it on Colab and a couple of other machines but wasn’t able to. It would be tremendously helpful if the ready dataset is hosted somewhere.

Thanks in advance.
Thank you for the wonderful course as well!


1 Like

A post was split to a new topic: Remote NLP Study Group Saturdays at 8 AM PST, starting 12/14/2019

Hi i downloaded a sentiment analysis dataset, a movie review dataset.
The structure of dataset is like this:
1st column: Movie ID
2nd column: movie review (text in english)
3rd column: ratings (numbers in decimal from 0 to 5)

so to predict the rating this has to be a tabluar model but for the model to interpret what the review means, another language model has to created and has to be used with the tabular data.

How do i do that ?


Hi @AjayStark, have a look at @quan.tran’s helpful post & github on mixed tabular & text, it worked well for me:


Hii, Thank you. I’ll Look into it :smiley:

Hi @AjayStark ,

If the task is just sentiment analysis that can be done using only the text_classifier.
The example below is an example of if you have both text as well as tabular data.

For sentiment analysis there are two possible ways you can achieve this:
(a) Classification:
You can treat each rating as a separate category. So the categories you will have are 0,1,2,3,4,5. And you can train a simple classifier. Comparing this to the example from Jeremy’s course. In the course, there were two classes: pos and neg. You have 6 classes here.

(b) Regression:
Instead of treating the ratings as separate classes you can treat them as a continuum. This is covered in bits and pieces/ inspired from the fastai.collab module. First, let’s see what’s happening in the collab module. The concept of ratings can be adopted directly. If the ratings are between 0-5 we first normalize them to 0-1. The main difference comes here, instead of training using the CrossEntropy loss which is used for classification tasks, the MSE loss function is used. The same can be done here.
You can check out the implementation of the collab module that you need to look at here. Checkout how the y_range is being used for the normalization.

So you take a text_classifer module, normalize the ratings. One point to add here is Jeremy in his lectures mentions from practical experience when using ratings between 0-5 normalize them between 0-5.5 because it’s a bit tricky for the model to predict values at the extremities. Then use the MSELoss to train the module.

Hope this helps.

EDIT: I’ve never experimented with sentiment analysis. Thus I cannot say which approach will work better. In my opinion having been through Jeremy’s course I think MSE loss should work better. However expeirments should be relied upon rather than intuition.

1 Like

:sunny: Notice of Serendipity If You Want To Learn NLP!:sunny:

Here’s a great opportunity if you plan to participate in the upcoming TWiML AI x Fastai Study Group for the course A Code-first Introduction to Natural Language Processing.

This Saturday, we will begin a three-week introduction to NLP from Lesson 12 of Deep Learning From the Foundations. Though not a required prerequisite, for the Fastai NLP course, these three sessions should provide a great introduction to and overview of NLP!

Time: 8:30 AM Pacific Time on Saturdays Nov 23rd, Nov 30th, and Dec 7th.
Place: In this Zoom chatroom.

Stay tuned on this channel for an announcement the details of this Saturday’s meeting!

The TWiML AI x Fastai NLP Study Group will meet Saturdays, beginning Dec. 14th, at 8:00 AM, a week after completion of the 3-part mini-review.

1 Like

Issue: Preemptible GCP instances

I’m a complete beginner to ML ( and programming, sort of), and am doing this course.
I am using GCP free credits, preemtible instance as per the instructions in the fastai documentation.

When I try to run train the IMDB full set after loading the wikitext model, the instances keeps
getting shut down at various stages of the training.

What happens to my model / data when I restart?
Is it possible to restart the training from where it was left off? I have been through this cycle about four times, and I am getting sisyphus nightmares.

Besides deleting this instance and starting a non-premtible one, is there a safe way to do this?

Thank you



The Deep Learning From the Foundations Study Group
will meet Saturday Nov 23rd, at 8:30 AM Pacific Time

Topic: Lesson 12, Text Data Preprocessing

Suggested Homework/preparation:

  1. Watch the Lesson 12 video from 1:15:10 to 1:45:17

  2. Familiarize with the notebook 12_text.ipynb

Join the Zoom Meeting when it’s time
To join via phone
Dial US: +1 669 900 6833 or +1 646 876 9923
Meeting ID: 832 034 584

The current meetup schedule is here

Sign up at Sam Charrington’s TWiML & AI x fast.ai Slack Group to receive meetup announcements via email

1 Like

Hi @zerochi,

If you’re talking about Colab then there’s no way to retain the instance.
But what you can do is connect the colab notebook to your drive and save databunches/models which can then be reused.

Stackoverflow answer is here

Essentially it’s this:

from google.colab import drive

And just another tip, change the runtime instacne to GPU. It’s not GPU by default.

Runtime -> Change runtime type

Second, if you somehow manage to crash the notebook due to excessive consumption of RAM Colab automatically suggest upgrading your RAM to 25GB.

Yes, Colab is made for short experiments any long standing process that you leave on and close the notebook will eventually be stopped.
There are a couple of ‘tricks’ JS script to click on the Connect button which apparently hepls in keeoing it connected. But I can’t find it right now.


1 Like

Trying to reproduce the NLP transfer learning from nn-imdb-more.ipynb on Google Colab.

After creating data_lm with my own data and checking the length of vocab.itos and vocab.stoi I’m getting 18744 and 73037 which is ok.

But after data_lm.save('lm_databunch') and data_lm = load_data(path, 'lm_databunch', bs=bs) and checking again vocab.itos and vocab.stoi – now I’m getting 18744 and 18742.

How is this possible? Or is this my mistake?


Thank you. I am not running this on colab, but on google cloud hosting. on google cloud I created a preemptible compute engine. I run jupyter notebooks on this compute engine.

My current understanding is that the data is gone, because when training is interrupted, that’s what happens.



Hi @zerochi. Save your model every epoch (or whenever you like) during training with:
Like this at least you don’t lose too much when your instance gets preempted/interrupted.

1 Like

Thank you! this fixes my issue.

1 Like

How can I use this Vietnamese notebook course to create next word predictor.

I have the .pth model and the vocab as .pkl in Serbian language created.

Now, I don’t plan to learn any more:

learn_lm = language_model_learner(data_lm, 

I would like to predict. Is there any notebook that shows how to do that?

I guess it is here. The predict method looks good.

During our discussion of the paper
The Curious Case of Neural Network Degeneration in connection with Lesson #14, we wondered

"How does the temperature parameter affect the distribution of word probabilities?"

I made a Jupyter notebook in which I empirically answer the above question and also discuss the decoding strategies mentioned in the paper.

Hello @pierreguillou , I think you can save your money for small tasks by using google colab workspace


: is all the ‘vertical’ rows (each row has bptt tokens)
-1 is the last token of each row

So for the whole batch (bs rows), the last word is the target. Think that when working with Tensors, most probably there is always a batch dimension (even if the batch has only one element).

Later in the lesson the model is meant to predict all the words expect the first one (with 1 > predict 2, with 1 and 2 > predict 3, …) but in this moment we were taking:
input : words 1…n-1
target: word n