Lesson 8 - Official topic

What BERT does by masking tokens can be used for data augmentation: you can randomly blank tokens or replace them by random tokens.



1 Like

20 data samples! Mind blowing :scream:


Folks can check this out if they have prior experience in albumenations lib NLP Albumenation kaggle kernel

How do we get pretrained language models for combined languages (e.g. some non-english people mix tweets in 2 different languages)?

Here it is @sgugger : https://forums.fast.ai/t/bug-in-predict-for-language-model-learner/70523

The language model gives you the next ‘best’ word. Is is possible to tell it to give you the possible n words? So then you can decide which one to use? This is in case you want to use the language model as a content generator.

I haven’t tried this extensively on tasks, but here is a summary writeup of the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks


Re the question on seq2seq, you can also check out this lesson from the NLP class:


Yep it’s possible! (Look for beam-search!)


Indeed. That’s called beam search, as opposed to a greedy approach.

1 Like

fastai v1 had some tools that might be similar to what you are looking for:

Probably lot of hidden information is ingrained into the text during the translation process.

Would some images associated to specific sentences also help as augmentation? If so, how could that be implemented?

Is the language model trained for IMNDB that is loaded composed only the word embeddings parameters? or also also the weights of the LSTM model ?


Fastbook chapter 10 questionnaire solutions:

Compare your answers or feel free to contribute!

@rachel’s SciPy 2019 Keynote: https://www.youtube.com/watch?v=KChtdexd5Jo

Regarding this threat for using advanced language models to manipulate public opinion, I covered this in more detail in my SciPy keynote last summer:

Edited to add: the first half is a high-level overview of the use of transfer learning in NLP (which will be review for you now) and 2nd half is on risks for manipulating public opinion, disinfo, etc.


In the previous lesson MNIST example, you showed us that “under the hood” the model was learning parts of the image, like curves of a 3 or angles of a 7.

Is there a way to look under the hood of the language models to see if they are learning rules of grammar/syntax?

Would it be a good idea to fine tune models with examples of domain specific grammar/syntax (like technical manuals) or does that miss the point of having the model learn for themselves?