Lesson 4 In-Class Discussion

charlielee · November 21, 2017, 4:54am

Check out LDA and/or NMF for topic modeling. I tried to use them to possibly generate some labels for a classification task.

yinterian · November 21, 2017, 4:54am

You can use embeddings for any NLP problem.

anamariapopescug · November 21, 2017, 4:55am

You’re learning an embedding matrix - so if you have this representation of words in a specific corpus, you can get “topics” by clustering the words. So you could learn topics for your domain, in that sense.

pramod.srinivasan · November 21, 2017, 4:56am

fast.ai : SoTAs made simple

SoTA : State-of-the-Art

Chris_Palmer · November 21, 2017, 4:56am

New question - not specific to NLP. Jeremy just stated that his “best epoch” was around 10, on a process that went to 16 epochs - how can you pick the model at that 10th epoch - do you run it all over again and tell it just to 10 epochs? What if you wanted to use cycle_mult which will take you past 10?

ecdrid · November 21, 2017, 4:57am

use the cycle_save parameter

jenna · November 21, 2017, 4:57am

Who is Jeremy talking about right now? I didn’t catch the name

ecdrid · November 21, 2017, 4:58am

aloisius · November 21, 2017, 4:58am

Sebastian Ruder?

yinterian · November 21, 2017, 4:58am

I think is this guy http://ruder.io/

vikbehal · November 21, 2017, 5:01am

Brillant! So much to digest!

arjunrajkumar · November 21, 2017, 5:02am

True… Mind is blown.

anandsaha · November 21, 2017, 5:02am

Yes, it will be a fun week And it was amazing to see SoTA results right here right now!

Ray2 · November 21, 2017, 5:07am

Using language model to pretrain a classifier is a brilliant idea!
Big thanks to @jeremy. !

zaoyang · November 21, 2017, 5:18am

Why are embeddings typically between 50 and 600? If you have such a high cardinality (i.e. 34,945), wouldn’t you use the same order of magnitude?

vikbehal · November 21, 2017, 5:20am

To save computation? Maybe it’s like dimension reduction?

zaoyang · November 21, 2017, 5:21am

why does the ratio of dropout matter in context of NLP? Is there an intuition behind it in context of this paper?

zaoyang · November 21, 2017, 5:24am

learn.clip=0.3 is clipping the gradient? I don’t understand how it works, it just limits the derivative of how the “ball” rolls down the hill?

zaoyang · November 21, 2017, 5:25am

How is word2vec different than embeddings?

vikbehal · November 21, 2017, 5:26am

Guess that’s how Jeremy represented in this lecture. This will become more clear in future lessons.
When we’ve a large ‘lr’, clip will help to ensure we don’t miss minima?