Lesson 8 - Official topic

Fastbook chapter 10 questionnaire solutions:

Compare your answers or feel free to contribute!

@rachelā€™s SciPy 2019 Keynote: https://www.youtube.com/watch?v=KChtdexd5Jo

Regarding this threat for using advanced language models to manipulate public opinion, I covered this in more detail in my SciPy keynote last summer:

Edited to add: the first half is a high-level overview of the use of transfer learning in NLP (which will be review for you now) and 2nd half is on risks for manipulating public opinion, disinfo, etc.

6 Likes

In the previous lesson MNIST example, you showed us that ā€œunder the hoodā€ the model was learning parts of the image, like curves of a 3 or angles of a 7.

Is there a way to look under the hood of the language models to see if they are learning rules of grammar/syntax?

Would it be a good idea to fine tune models with examples of domain specific grammar/syntax (like technical manuals) or does that miss the point of having the model learn for themselves?

6 Likes

It seems some people are interested in NLP model intepretability.

Again, PyTorch Captum seems to be an amazing tool for studying this.

They even have an example for text classification over here

It seems there is a fastai2 callback for ResNet interpretation using Captum over here. Maybe this could be useful for doing something similar with NLP models?

Additionally, fastai v1 had an NLP interpretation class over here

5 Likes

Again Rachelā€™s mic is breaking up

2 Likes

I donā€™t know NLP too well, but from what I understand, there is a debate in the literature about whether attention layers serve as explanations. Note that attention in NLP gave rise to the popular Transformer models (entitled Attention Is All You Need).

For example,

4 Likes

Now is better - the mic

4 Likes

it actually sounds like robotic a little bit or with noise

One thing that itā€™s useful is to look at the embedding: you can for example see that words with similar or connected meaning are close to each other in the embedding space. Thatā€™s a sign that the model has learned these connections.

1 Like

Can we get data augmentation by asking our language model to translate a database of sentences from English to English?

1 Like

What do you mean English => English? I would think thereā€™s no translation being done

Seems like a neat idea, but where would you find this database? Thatā€™s often the problem with bootstrapping these kinds of projects.

English to an intermediate language and back to English is often done for NLP data augmentation.

1 Like

My guess is that the gain was too high, causing the mic to saturate, which gives rise to distortion.

1 Like

Is there a way to speed up fine tuning the NLP model? 10+ minutes per epoch slows down the iterative process quite a bitā€¦ Any best practices/tips?

ā€“
Edit for other students in a similar situation: FYI it took me 26 minutes to train the first epoch. Iā€™m on Google Cloud with almost exactly the recommended setup. ā€“ Curious what others were seeing?

2 Likes

The simplest thing without changing the dataset is to try mixed precision maybe? learn.to_fp16()

Use a smaller dataset. Imdb is huge.

1 Like

Right. But my question is would translating English directly to English also work?

But would any translation service actually do anything? I have never checked this actually, but I assume they will just give you back the same text.