Lesson 8 - Official topic

You should open another topic for this, and explain the code you are running.

ok, sounds good.

How does this change/work when applied on prescriptive texts like say recipes (on corpus of recipes) or protocols / checklists? As these are no longer simple next word to a given word prediction.

1 Like

I think the combination of perplexity and accuracy are cool in my opinion. “Perplexity is the probability of the correct label (word), passed through a log function, negated, and exponentiated ( e ^ x ). Perplexity is high when two probability distributions don’t match, and it’s low (approaching 1) when they do match.” From Trask A(2019). Grokking Deep Learning. Manning Publications. New York, USA.

Could these techniques be used for editorial purposes? I’m thinking of the transcriptions that we edit. If the model is able to predict the next word in the sentence, as well as possible punctuation marks, how would we build something which feeds in the mistake-filled transcription and outputs a much better one?

Could this be reformulated as a machine translation problem? The mistake-filled transcription is one language, the error-free one is another. You would likely need paired examples though.

That’s a seq2seq (sequence to sequence) task. It won’t be covered in this course, but there is a lot of research around it.

2 Likes

Can someone explain how a model trained to predict the last word in a sentence can generalise to classify sentiment? They seem like different domains. Thanks!

3 Likes

Does the sequence length of the language model affect the memory/GPU usage when training the classifier?

Is there an intuition on why reversing the texts and training them helps the model learn better?

1 Like

Dumb question: What would data augmentation look like for NLP?

5 Likes

Rachel’s mic is not working properly?

7 Likes

Yes, the lower sequence length is going to use less memory.

1 Like

What are some techniques for data augmentation on text? Selecting synonyms?

2 Likes

Question from Shim San of the TWiML study group: Other than translation, what are the other data augmentation approaches for NLP?

are the aug_transforms in NLP similar to vision – do they run on GPU too like vision ?

In a high level, what changes will happen when we use GPT-2, BERT, ALBERT in lesson 10 notebook. Does it just change the tokenizer that we are using now? http://dev.fast.ai/tutorial.transformers

2 Likes

I don’t think Jeremy said that. You could combine the predictions of a model trained to classify sentences in the right order and another one trained to classify sentences reversed and get a better model by averaging their predictions.

1 Like

one cool approach people use for data augmentation in NLP is to use some automated translation service to translate into another language and then back to the original language (so like English -> French -> English). so you get almost the same words but slightly different

3 Likes

Yes, you should use the same tokenizer as what was used for training.

1 Like