Lesson 8 - Official topic

ah it looks like that’s mentioned in the notebook

There are no augmentation transforms in NLP in fastai.

1 Like

Question from Dinesh Chauhan of the TWiML study group: What are the best techniques for data augmentation in NLP?


Is there a way to visualize what the inner layers in a RNN doing for NLP analogous CNN for images ?

1 Like

What BERT does by masking tokens can be used for data augmentation: you can randomly blank tokens or replace them by random tokens.



1 Like

20 data samples! Mind blowing :scream:


Folks can check this out if they have prior experience in albumenations lib NLP Albumenation kaggle kernel

How do we get pretrained language models for combined languages (e.g. some non-english people mix tweets in 2 different languages)?

Here it is @sgugger : https://forums.fast.ai/t/bug-in-predict-for-language-model-learner/70523

The language model gives you the next ‘best’ word. Is is possible to tell it to give you the possible n words? So then you can decide which one to use? This is in case you want to use the language model as a content generator.

I haven’t tried this extensively on tasks, but here is a summary writeup of the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks


Re the question on seq2seq, you can also check out this lesson from the NLP class:


Yep it’s possible! (Look for beam-search!)


Indeed. That’s called beam search, as opposed to a greedy approach.

1 Like

fastai v1 had some tools that might be similar to what you are looking for:

Probably lot of hidden information is ingrained into the text during the translation process.

Would some images associated to specific sentences also help as augmentation? If so, how could that be implemented?

Is the language model trained for IMNDB that is loaded composed only the word embeddings parameters? or also also the weights of the LSTM model ?