Lesson 4 In-Class Discussion

(Charles C. Lee) #226

Check out LDA and/or NMF for topic modeling. I tried to use them to possibly generate some labels for a classification task.

(yinterian) #227

You can use embeddings for any NLP problem.

(anamariapopescug) #228

You’re learning an embedding matrix - so if you have this representation of words in a specific corpus, you can get “topics” by clustering the words. So you could learn topics for your domain, in that sense.

(Pramod) #229

fast.ai : SoTAs made simple

SoTA : State-of-the-Art

(Chris Palmer) #230

New question - not specific to NLP. Jeremy just stated that his “best epoch” was around 10, on a process that went to 16 epochs - how can you pick the model at that 10th epoch - do you run it all over again and tell it just to 10 epochs? What if you wanted to use cycle_mult which will take you past 10?

(Aditya) #231

use the cycle_save parameter


Who is Jeremy talking about right now? I didn’t catch the name

(Aditya) #233

(Jordan) #234

Sebastian Ruder?

(yinterian) #235

I think is this guy http://ruder.io/

(Vikrant Behal) #236

Brillant! So much to digest! :smiley:

(Arjun Rajkumar) #237

True… Mind is blown.

(Anand Saha) #238

Yes, it will be a fun week :slight_smile: And it was amazing to see SoTA results right here right now!

(Yihui Ray Ren) #239

Using language model to pretrain a classifier is a brilliant idea!
Big thanks to @jeremy. !

(Zao Yang) #240

Why are embeddings typically between 50 and 600? If you have such a high cardinality (i.e. 34,945), wouldn’t you use the same order of magnitude?

(Vikrant Behal) #241

To save computation? Maybe it’s like dimension reduction?

(Zao Yang) #242

why does the ratio of dropout matter in context of NLP? Is there an intuition behind it in context of this paper?

(Zao Yang) #243

learn.clip=0.3 is clipping the gradient? I don’t understand how it works, it just limits the derivative of how the “ball” rolls down the hill?

(Zao Yang) #244

How is word2vec different than embeddings?

(Vikrant Behal) #245

Guess that’s how Jeremy represented in this lecture. This will become more clear in future lessons.
When we’ve a large ‘lr’, clip will help to ensure we don’t miss minima?