Over the last year I’ve been working a lot with doc2vec and word2vec, so I was intrigued by @jeremy’s comment (here) that:
[…] in NLP I’m really pushing the idea that need to move past word2vec, glove […] because those embeddings are way less predictive than embeddings learnt from deep models […]
Could you please elaborate on this? Are you basically saying “forget about word2vec, simply start with a random embedding layer (wired into RNN or whatever) and let it converge as part of the overall network training”?
I think this is the key part.
If you are into NLP the model introduced in lecture 4 towards the end and elaborated on in this paper is mind blowing. If I had more time or were not such a newb as I am I would immediately jump into this. But first need to do a bit more with CV in preparation for part 2 and also to get a bit of that newb out
But really, really hoping I will have a chance to play around with this in 2 maybe 3 weeks.