Lesson 8 - Official topic

ilovescience · May 6, 2020, 2:39am

Fastbook chapter 10 questionnaire solutions:

Compare your answers or feel free to contribute!

erinjerri · May 6, 2020, 2:39am

@rachel’s SciPy 2019 Keynote: https://www.youtube.com/watch?v=KChtdexd5Jo

rachel · May 6, 2020, 2:39am

Regarding this threat for using advanced language models to manipulate public opinion, I covered this in more detail in my SciPy keynote last summer:

Edited to add: the first half is a high-level overview of the use of transfer learning in NLP (which will be review for you now) and 2nd half is on risks for manipulating public opinion, disinfo, etc.

MJB · May 6, 2020, 2:41am

In the previous lesson MNIST example, you showed us that “under the hood” the model was learning parts of the image, like curves of a 3 or angles of a 7.

Is there a way to look under the hood of the language models to see if they are learning rules of grammar/syntax?

Would it be a good idea to fine tune models with examples of domain specific grammar/syntax (like technical manuals) or does that miss the point of having the model learn for themselves?

ilovescience · May 6, 2020, 2:46am

It seems some people are interested in NLP model intepretability.

Again, PyTorch Captum seems to be an amazing tool for studying this.

They even have an example for text classification over here

It seems there is a fastai2 callback for ResNet interpretation using Captum over here. Maybe this could be useful for doing something similar with NLP models?

Additionally, fastai v1 had an NLP interpretation class over here

ilovescience · May 6, 2020, 2:47am

Again Rachel’s mic is breaking up

jwuphysics · May 6, 2020, 2:47am

I don’t know NLP too well, but from what I understand, there is a debate in the literature about whether attention layers serve as explanations. Note that attention in NLP gave rise to the popular Transformer models (entitled Attention Is All You Need).

For example,

gamino · May 6, 2020, 2:48am

Now is better - the mic

victor.vargas · May 6, 2020, 2:48am

it actually sounds like robotic a little bit or with noise

giacomov · May 6, 2020, 2:49am

One thing that it’s useful is to look at the embedding: you can for example see that words with similar or connected meaning are close to each other in the embedding space. That’s a sign that the model has learned these connections.

jcatanza · May 6, 2020, 2:53am

Can we get data augmentation by asking our language model to translate a database of sentences from English to English?

Raymond-Wu · May 6, 2020, 2:54am

What do you mean English => English? I would think there’s no translation being done

jwuphysics · May 6, 2020, 2:54am

Seems like a neat idea, but where would you find this database? That’s often the problem with bootstrapping these kinds of projects.

ilovescience · May 6, 2020, 2:54am

English to an intermediate language and back to English is often done for NLP data augmentation.

jcatanza · May 6, 2020, 2:55am

My guess is that the gain was too high, causing the mic to saturate, which gives rise to distortion.

steef · May 6, 2020, 2:55am

Is there a way to speed up fine tuning the NLP model? 10+ minutes per epoch slows down the iterative process quite a bit… Any best practices/tips?

–
Edit for other students in a similar situation: FYI it took me 26 minutes to train the first epoch. I’m on Google Cloud with almost exactly the recommended setup. – Curious what others were seeing?

ilovescience · May 6, 2020, 2:55am

The simplest thing without changing the dataset is to try mixed precision maybe? learn.to_fp16()

sgugger · May 6, 2020, 2:55am

Use a smaller dataset. Imdb is huge.

jcatanza · May 6, 2020, 2:56am

Right. But my question is would translating English directly to English also work?

ilovescience · May 6, 2020, 2:58am

But would any translation service actually do anything? I have never checked this actually, but I assume they will just give you back the same text.