A walk with fastai2 - Text - Study Group and Online Lectures Megathread

morgan · May 7, 2020, 9:38am

Are you looking for a pre-trained model or you’d like to train one from scratch?

Pre-trained

I see HuggingFace have a community-submitted Italian BERT model here that you could try use: https://huggingface.co/models?search=italian

See below for how to use it with fastai2

From Scratch

Pre-train data:
You can have a look at the scripts here to download all italian wikipedia articles: https://github.com/fastai/fastai/tree/0a6f3894cd4881c0f4799d8f7533d20c6077a0dc/courses/dl2/imdb_scripts

And then you can consider whether to use the AWD_LSTM model or a transformer:

AWD_LSTM

Fastai wikitext tutorial using AWD_LSTM to pre-train a language model and fine-tune for classification:

Transformer options

My FastHugs notebooks: https://github.com/morganmcg1/fasthugs

First use the language model notebook to pre-train, then use the sequence classification model to do classification

@Richard-Wang has also done pre-train and fine-tuning of transformers here: Pretrain MLM and fintune on GLUE with fastai - 1 - Masked laguage model callback and Electra callback

@wgpubs recently released a library to use HuggingFace transformers, although as of writing I don’t think you can pre-train with it yet, but the classification element should work https://ohmeow.github.io/blurr/

Sylvain also released a Fastai transformers tutorial, but right now it only covers text generation, but worth a look to see how he integrates HF and fastai: http://dev.fast.ai/tutorial.transformers

One disadvantage to training from scratch with transformers is that the impressive results they have gotten has been due to using really huge amounts of data and take a long time to pre-train, so I would either start with a pre-trained transformer model or pre-train an AWD_LSTM

Other Italian models

I found this thread from fastai v1 which is worth a look too: ULMFit - Italian - v1

AjayStark · May 23, 2020, 3:27pm

Hii @muellerzr, once the language model is created, the model understands the language so from is there is it possible to take it to chat bots? has anybody worked on it ?

morgan · May 23, 2020, 5:28pm

They’re similar alight, have a look at the DialoGPT chatbot model in HuggingFace’s docs:

It looks like they trained it as a LM, from the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation:

We follow the OpenAI GPT-2 to model a multiturn dialogue session as a long text and frame the generation task as language modeling. We first concatenate all dialog turns within a dialogue session into a long text x_1,…, x_N (N is the sequence length), ended by the end-of-text token.

AjayStark · May 24, 2020, 6:25am

That’s great, I will look into it

muellerzr · June 19, 2020, 12:01pm

For those wanting an update regarding this course, please see here:

https://forums.fast.ai/t/future-courses-by-me-including-two-new-ones/73544/2

ModdingLeo · July 5, 2020, 10:18am

Looking at the imdb example when using the Datablock API

imdb_lm = DataBlock(blocks=TextBlock.from_df(‘text’, is_lm=True), get_x=ColReader(‘text’), splitter=RandomSplitter(0.1))

I want to confirm my understanding of this example…

We have a DataBlock
Within that DataBlock we have many TextBlocks
Each TextBlock has a method .from_df which in this example is saying go to our ‘text’ df and we a specifying it’s a language model
Once the TextBlock has this data it will use get_x from the ‘text’ column using ColReader
And finally the data is being split 90:10 for training and validation set

The next part I’d like confirmation/correction on. I understand it is probably in the source code but I can’t understand it fully.

Q: Is the get_x being performed by each TextBlock here or the DataBlock?
Q: Similar to the above question, is the data being split once the TextBlocks form up the DataBlock or is it split per TextBlock

Any help would be appreciated!

muellerzr · July 5, 2020, 11:24am

I think my new example may help some. I’m in the middle of redoing the course material/course, check out the revamped lesson 1, this may help. However it’s just like the regular DataBlock API, one TextBlock for your input, other blocks for output.

github.com

muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/text-in-progress/Text Notebooks/01_Intro.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "01_Intro.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6O6Bq4eHwAIM",

This file has been truncated. show original

shimsan · July 7, 2020, 10:55pm

Hi @muellerzr, I’m looking at your updated 01_Intro notebook, and I need some help understanding the learning rate adjuster schema and how you come up with it.

adj = 2.6**4

Maybe this is just a heuristic, and if you could share the rationale behind it, it’d help me apply to a different dataset.

Also, I noticed you were adjusting the lr based on how you were unfreezing. I seem to be missing some basic info on why this is the case. Did I miss this in one of the lessons somewhere?

Thanks!

muellerzr · July 7, 2020, 11:04pm

I’d check the course-v3 lesson video on ULM-FiT. It comes directly from the paper, this is how they trained the model

shimsan · July 8, 2020, 4:25am

Thank you, kind sir!

I had come across this and noted it, and somehow forgotten it! I even had looked at the forums and discussed about it. Duh!

Anyways, hiromi’s notes are a great place to revise it:

github.com

hiromis/notes/blob/master/Lesson4.md

# Lesson 4

[Video](https://youtu.be/C9UdVPE3ynA) / [Lesson Forum](https://forums.fast.ai/t/lesson-4-official-resources-and-updates/30317)

Welcome to Lesson 4! We are going to finish our journey through these key applications. We've already looked at a range of vision applications. We've looked a classification, localization, image regression. We briefly touched on NLP. We're going to do a deeper dive into NLP transfer learning today. We're going to then look at tabular data and  collaborative filtering which are both super useful applications. 

Then we're going to take a complete u-turn. We're going to take that collaborative filtering example and dive deeply into it to understand exactly what's happening mathematically﹣exactly what's happening in the computer. And we're going to use that to gradually go back in reverse order through the applications again in order to understand exactly what's going on behind the scenes of all of those applications.

### Correction on CamVid result 

Before we do, somebody on the forum is kind enough to point out that when we compared ourselves to what we think might be the state of the art or was recently the state of the art for CamVid, there wasn't a fair comparison because the paper actually used a small subset of the classes, and we used all of the classes. So Jason in our study group was kind enough to rerun the experiments with the correct subset of classes from the paper, and our accuracy went up to 94% compared to 91.5% of the paper. So I think that's a really cool result. and a great example of how pretty much just using the defaults nowadays can get you far beyond what was the best of a year or two ago. It was certainly the best last year when we were doing this course because we started it quite intensely. So that's really exciting.

## Natural Language Processing (NLP) [[2:00](https://youtu.be/C9UdVPE3ynA?t=120)]

What I wanted to start with is going back over NLP a little bit to understand really what was going on there. 

### A quick review

So first of all, a quick review. Remember NLP is natural language processing. It's about taking text and doing something with it. Text classification is particularly useful﹣practically useful applications. It's what we're going to start off focusing on. Because classifying a text or classifying a document can be used for anything from:

This file has been truncated. show original

Ctrl-F -> ‘magic’

shimsan · July 8, 2020, 5:33am

I wonder if anyone knows how to deal with overfitting with an RNN? I hope Zach talks about this in subsequent lessons.

For example, I have something like this, while training:

My valid_loss is not going down but train_loss is, and is drifting away from the valid_loss. I’m trying to first overfit, and then regularize. How do I do that now?

I tried to change drop_mult, but it does not seem to be an attribute of the learner. Does anyone else have experience with this?

================================

UPDATE:

I still don’t know how to change drop_mult or alter in the middle of my training loop, interactively by looking at my loss trend. However, I initialized with drop_mult of 1 when I create the learner, and I am seeing much better behavior that is allowing me to train for longer, and reach better scores.

Also, using lower values of moms helps. moms default for text_classifier_learner is moms = (0.95, 0.85, 0.95)
I’m seeing better trend with:
moms = (0.8, 0.7, 0.8)

stefan-ai · July 8, 2020, 8:24am

Not sure if it’s possible to alter drop_mult after setting up Learner, but what you could do is add/increase weight decay by passing e.g. wd=0.01 or higher values to learn.fit_one_cycle(...)