Lesson 8 - Official topic

wdhorton · May 6, 2020, 2:33am

ah it looks like that’s mentioned in the notebook

sgugger · May 6, 2020, 2:33am

There are no augmentation transforms in NLP in fastai.

jcatanza · May 6, 2020, 2:33am

Question from Dinesh Chauhan of the TWiML study group: What are the best techniques for data augmentation in NLP?

pinaki · May 6, 2020, 2:33am

Is there a way to visualize what the inner layers in a RNN doing for NLP analogous CNN for images ?

sgugger · May 6, 2020, 2:34am

What BERT does by masking tokens can be used for data augmentation: you can randomly blank tokens or replace them by random tokens.

Raymond-Wu · May 6, 2020, 2:34am

https://arxiv.org/abs/1904.12848

FraPochetti · May 6, 2020, 2:34am

20 data samples! Mind blowing

ecdrid · May 6, 2020, 2:35am

Folks can check this out if they have prior experience in albumenations lib NLP Albumenation kaggle kernel

butchland · May 6, 2020, 2:35am

How do we get pretrained language models for combined languages (e.g. some non-english people mix tweets in 2 different languages)?

giacomov · May 6, 2020, 2:35am

Here it is @sgugger : https://forums.fast.ai/t/bug-in-predict-for-language-model-learner/70523

gamino · May 6, 2020, 2:35am

The language model gives you the next ‘best’ word. Is is possible to tell it to give you the possible n words? So then you can decide which one to use? This is in case you want to use the language model as a content generator.

init_27 · May 6, 2020, 2:36am

I haven’t tried this extensively on tasks, but here is a summary writeup of the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

rachel · May 6, 2020, 2:36am

Re the question on seq2seq, you can also check out this lesson from the NLP class:

ecdrid · May 6, 2020, 2:36am

Yep it’s possible! (Look for beam-search!)

FraPochetti · May 6, 2020, 2:36am

Indeed. That’s called beam search, as opposed to a greedy approach.

ilovescience · May 6, 2020, 2:37am

fastai v1 had some tools that might be similar to what you are looking for:
https://docs.fast.ai/text.interpret.html

aloy · May 6, 2020, 2:37am

Probably lot of hidden information is ingrained into the text during the translation process.

aloy · May 6, 2020, 2:38am

Would some images associated to specific sentences also help as augmentation? If so, how could that be implemented?

rprimi · May 6, 2020, 2:38am

Is the language model trained for IMNDB that is loaded composed only the word embeddings parameters? or also also the weights of the LSTM model ?

JPKab · May 6, 2020, 2:39am