Check out LDA and/or NMF for topic modeling. I tried to use them to possibly generate some labels for a classification task.
You can use embeddings for any NLP problem.
You’re learning an embedding matrix - so if you have this representation of words in a specific corpus, you can get “topics” by clustering the words. So you could learn topics for your domain, in that sense.
New question - not specific to NLP. Jeremy just stated that his “best epoch” was around 10, on a process that went to 16 epochs - how can you pick the model at that 10th epoch - do you run it all over again and tell it just to 10 epochs? What if you wanted to use cycle_mult which will take you past 10?
Who is Jeremy talking about right now? I didn’t catch the name
Brillant! So much to digest!
True… Mind is blown.
Yes, it will be a fun week And it was amazing to see SoTA results right here right now!
Using language model to pretrain a classifier is a brilliant idea!
Big thanks to @jeremy. !
Why are embeddings typically between 50 and 600? If you have such a high cardinality (i.e. 34,945), wouldn’t you use the same order of magnitude?
To save computation? Maybe it’s like dimension reduction?
why does the ratio of dropout matter in context of NLP? Is there an intuition behind it in context of this paper?
learn.clip=0.3 is clipping the gradient? I don’t understand how it works, it just limits the derivative of how the “ball” rolls down the hill?
How is word2vec different than embeddings?
Guess that’s how Jeremy represented in this lecture. This will become more clear in future lessons.
When we’ve a large ‘lr’, clip will help to ensure we don’t miss minima?