Lesson 4 In-Class Discussion

Check out LDA and/or NMF for topic modeling. I tried to use them to possibly generate some labels for a classification task.

You can use embeddings for any NLP problem.

You’re learning an embedding matrix - so if you have this representation of words in a specific corpus, you can get “topics” by clustering the words. So you could learn topics for your domain, in that sense.

fast.ai : SoTAs made simple


SoTA : State-of-the-Art

3 Likes

New question - not specific to NLP. Jeremy just stated that his “best epoch” was around 10, on a process that went to 16 epochs - how can you pick the model at that 10th epoch - do you run it all over again and tell it just to 10 epochs? What if you wanted to use cycle_mult which will take you past 10?

use the cycle_save parameter

2 Likes

Who is Jeremy talking about right now? I didn’t catch the name

2 Likes

Sebastian Ruder?

1 Like

I think is this guy http://ruder.io/

8 Likes

Brillant! So much to digest! :smiley:

9 Likes

True… Mind is blown.

Yes, it will be a fun week :slight_smile: And it was amazing to see SoTA results right here right now!

3 Likes

Using language model to pretrain a classifier is a brilliant idea!
Big thanks to @jeremy. !

2 Likes

Why are embeddings typically between 50 and 600? If you have such a high cardinality (i.e. 34,945), wouldn’t you use the same order of magnitude?

To save computation? Maybe it’s like dimension reduction?

why does the ratio of dropout matter in context of NLP? Is there an intuition behind it in context of this paper?

2 Likes

learn.clip=0.3 is clipping the gradient? I don’t understand how it works, it just limits the derivative of how the “ball” rolls down the hill?

How is word2vec different than embeddings?

Guess that’s how Jeremy represented in this lecture. This will become more clear in future lessons.
When we’ve a large ‘lr’, clip will help to ensure we don’t miss minima?