Lesson 8 - Official topic

sgugger · May 6, 2020, 2:26am

You should open another topic for this, and explain the code you are running.

giacomov · May 6, 2020, 2:26am

ok, sounds good.

maya · May 6, 2020, 2:27am

How does this change/work when applied on prescriptive texts like say recipes (on corpus of recipes) or protocols / checklists? As these are no longer simple next word to a given word prediction.

radikubwa · May 6, 2020, 2:27am

I think the combination of perplexity and accuracy are cool in my opinion. “Perplexity is the probability of the correct label (word), passed through a log function, negated, and exponentiated ( e ^ x ). Perplexity is high when two probability distributions don’t match, and it’s low (approaching 1) when they do match.” From Trask A(2019). Grokking Deep Learning. Manning Publications. New York, USA.

quantum · May 6, 2020, 2:27am

Could these techniques be used for editorial purposes? I’m thinking of the transcriptions that we edit. If the model is able to predict the next word in the sentence, as well as possible punctuation marks, how would we build something which feeds in the mistake-filled transcription and outputs a much better one?

ilovescience · May 6, 2020, 2:28am

Could this be reformulated as a machine translation problem? The mistake-filled transcription is one language, the error-free one is another. You would likely need paired examples though.

sgugger · May 6, 2020, 2:29am

That’s a seq2seq (sequence to sequence) task. It won’t be covered in this course, but there is a lot of research around it.

geoffpidcock · May 6, 2020, 2:29am

Can someone explain how a model trained to predict the last word in a sentence can generalise to classify sentiment? They seem like different domains. Thanks!

bibsian · May 6, 2020, 2:30am

Does the sequence length of the language model affect the memory/GPU usage when training the classifier?

harish3110 · May 6, 2020, 2:30am

Is there an intuition on why reversing the texts and training them helps the model learn better?

victor.vargas · May 6, 2020, 2:30am

Dumb question: What would data augmentation look like for NLP?

ilovescience · May 6, 2020, 2:30am

Rachel’s mic is not working properly?

sgugger · May 6, 2020, 2:30am

Yes, the lower sequence length is going to use less memory.

Raymond-Wu · May 6, 2020, 2:31am

What are some techniques for data augmentation on text? Selecting synonyms?

jcatanza · May 6, 2020, 2:31am

Question from Shim San of the TWiML study group: Other than translation, what are the other data augmentation approaches for NLP?

pinaki · May 6, 2020, 2:31am

are the aug_transforms in NLP similar to vision – do they run on GPU too like vision ?

arunslb123 · May 6, 2020, 2:32am

In a high level, what changes will happen when we use GPT-2, BERT, ALBERT in lesson 10 notebook. Does it just change the tokenizer that we are using now? http://dev.fast.ai/tutorial.transformers

sgugger · May 6, 2020, 2:32am

I don’t think Jeremy said that. You could combine the predictions of a model trained to classify sentences in the right order and another one trained to classify sentences reversed and get a better model by averaging their predictions.

wdhorton · May 6, 2020, 2:32am

one cool approach people use for data augmentation in NLP is to use some automated translation service to translate into another language and then back to the original language (so like English -> French -> English). so you get almost the same words but slightly different

sgugger · May 6, 2020, 2:32am

Yes, you should use the same tokenizer as what was used for training.