Check out LDA and/or NMF for topic modeling. I tried to use them to possibly generate some labels for a classification task.
You can use embeddings for any NLP problem.
Youâre learning an embedding matrix - so if you have this representation of words in a specific corpus, you can get âtopicsâ by clustering the words. So you could learn topics for your domain, in that sense.
New question - not specific to NLP. Jeremy just stated that his âbest epochâ was around 10, on a process that went to 16 epochs - how can you pick the model at that 10th epoch - do you run it all over again and tell it just to 10 epochs? What if you wanted to use cycle_mult which will take you past 10?
use the cycle_save parameter
Who is Jeremy talking about right now? I didnât catch the name
Sebastian Ruder?
Brillant! So much to digest!
True⌠Mind is blown.
Yes, it will be a fun week And it was amazing to see SoTA results right here right now!
Why are embeddings typically between 50 and 600? If you have such a high cardinality (i.e. 34,945), wouldnât you use the same order of magnitude?
To save computation? Maybe itâs like dimension reduction?
why does the ratio of dropout matter in context of NLP? Is there an intuition behind it in context of this paper?
learn.clip=0.3 is clipping the gradient? I donât understand how it works, it just limits the derivative of how the âballâ rolls down the hill?
How is word2vec different than embeddings?
Guess thatâs how Jeremy represented in this lecture. This will become more clear in future lessons.
When weâve a large âlrâ, clip will help to ensure we donât miss minima?