A walk with fastai2 - Text - Study Group and Online Lectures Megathread

Re: Lecture tonight, I’ll post a recording tonight/tommorow morning. My internet is far from stable to do the live streaming :frowning:


My best guess (before I have the chance to dig into the nb and supporting code) at the difference between the data_lm.vocab being 7080 and data_lm.o2i.items() being 7050 with the rest of the words mapped to the unk token is this: Typically when a word occurs less than a certain number of times (say 5 times) in the text then it is mapped to unk. That is probably what is going on here but will need to dig through nb+code to confirm.


What is tokenizer?

Is data_lm.o2i.items() being generated with help of spacy tokenizers?

@muellerzr Hope things are well at your end! Just wondering when the next lecture is?


@muellerzr just wondering when the next lecture will be?

I’m afraid there won’t be one, after starting this I realized my knowledge of NLP needs a bit more before I dive into this and my time constraints limit this. I’d recommend Rachel’s course instead, perhaps a few of you following this could combine your efforts to try to recreate the notebooks in fastai2 even! Apologies :slight_smile:


@muellerzr I am looking for an NLP classification model for italian language? any suggestions

Are you looking for a pre-trained model or you’d like to train one from scratch?


I see HuggingFace have a community-submitted Italian BERT model here that you could try use: https://huggingface.co/models?search=italian

See below for how to use it with fastai2

From Scratch

Pre-train data:
You can have a look at the scripts here to download all italian wikipedia articles: https://github.com/fastai/fastai/tree/0a6f3894cd4881c0f4799d8f7533d20c6077a0dc/courses/dl2/imdb_scripts

And then you can consider whether to use the AWD_LSTM model or a transformer:


Fastai wikitext tutorial using AWD_LSTM to pre-train a language model and fine-tune for classification:

Transformer options

My FastHugs notebooks: https://github.com/morganmcg1/fasthugs

  • First use the language model notebook to pre-train, then use the sequence classification model to do classification

@Richard-Wang has also done pre-train and fine-tuning of transformers here: Pretrain MLM and fintune on GLUE with fastai - 1 - Masked laguage model callback and Electra callback

@wgpubs recently released a library to use HuggingFace transformers, although as of writing I don’t think you can pre-train with it yet, but the classification element should work https://ohmeow.github.io/blurr/

Sylvain also released a Fastai transformers tutorial, but right now it only covers text generation, but worth a look to see how he integrates HF and fastai: http://dev.fast.ai/tutorial.transformers

One disadvantage to training from scratch with transformers is that the impressive results they have gotten has been due to using really huge amounts of data and take a long time to pre-train, so I would either start with a pre-trained transformer model or pre-train an AWD_LSTM

Other Italian models

I found this thread from fastai v1 which is worth a look too: ULMFit - Italian - v1


Hii @muellerzr, once the language model is created, the model understands the language so from is there is it possible to take it to chat bots? has anybody worked on it ?

They’re similar alight, have a look at the DialoGPT chatbot model in HuggingFace’s docs:

It looks like they trained it as a LM, from the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation:

We follow the OpenAI GPT-2 to model a multiturn dialogue session as a long text and frame the generation task as language modeling. We first concatenate all dialog turns within a dialogue session into a long text x_1,…, x_N (N is the sequence length), ended by the end-of-text token.


That’s great, I will look into it :smiley:

1 Like

For those wanting an update regarding this course, please see here:


1 Like

Looking at the imdb example when using the Datablock API

imdb_lm = DataBlock(blocks=TextBlock.from_df(‘text’, is_lm=True), get_x=ColReader(‘text’), splitter=RandomSplitter(0.1))

I want to confirm my understanding of this example…

  • We have a DataBlock
  • Within that DataBlock we have many TextBlocks
  • Each TextBlock has a method .from_df which in this example is saying go to our ‘text’ df and we a specifying it’s a language model
  • Once the TextBlock has this data it will use get_x from the ‘text’ column using ColReader
  • And finally the data is being split 90:10 for training and validation set

The next part I’d like confirmation/correction on. I understand it is probably in the source code but I can’t understand it fully.

Q: Is the get_x being performed by each TextBlock here or the DataBlock?
Q: Similar to the above question, is the data being split once the TextBlocks form up the DataBlock or is it split per TextBlock

Any help would be appreciated! :smiley:

I think my new example may help some. I’m in the middle of redoing the course material/course, check out the revamped lesson 1, this may help. However it’s just like the regular DataBlock API, one TextBlock for your input, other blocks for output.

Hi @muellerzr, I’m looking at your updated 01_Intro notebook, and I need some help understanding the learning rate adjuster schema and how you come up with it.

adj = 2.6**4

Maybe this is just a heuristic, and if you could share the rationale behind it, it’d help me apply to a different dataset.

Also, I noticed you were adjusting the lr based on how you were unfreezing. I seem to be missing some basic info on why this is the case. Did I miss this in one of the lessons somewhere?


I’d check the course-v3 lesson video on ULM-FiT. It comes directly from the paper, this is how they trained the model :slight_smile:

1 Like

Thank you, kind sir!

I had come across this and noted it, and somehow forgotten it! I even had looked at the forums and discussed about it. Duh!

Anyways, hiromi’s notes are a great place to revise it:

Ctrl-F -> ‘magic’

1 Like

I wonder if anyone knows how to deal with overfitting with an RNN? I hope Zach talks about this in subsequent lessons.

For example, I have something like this, while training:


My valid_loss is not going down but train_loss is, and is drifting away from the valid_loss. I’m trying to first overfit, and then regularize. How do I do that now?

I tried to change drop_mult, but it does not seem to be an attribute of the learner. Does anyone else have experience with this?



I still don’t know how to change drop_mult or alter in the middle of my training loop, interactively by looking at my loss trend. However, I initialized with drop_mult of 1 when I create the learner, and I am seeing much better behavior that is allowing me to train for longer, and reach better scores.

Also, using lower values of moms helps. moms default for text_classifier_learner is moms = (0.95, 0.85, 0.95)
I’m seeing better trend with:
moms = (0.8, 0.7, 0.8)

Not sure if it’s possible to alter drop_mult after setting up Learner, but what you could do is add/increase weight decay by passing e.g. wd=0.01 or higher values to learn.fit_one_cycle(...)

1 Like