Over the last year I’ve been working a lot with doc2vec and word2vec, so I was intrigued by @jeremy’s comment (here) that:
[…] in NLP I’m really pushing the idea that need to move past word2vec, glove […] because those embeddings are way less predictive than embeddings learnt from deep models […]
Could you please elaborate on this? Are you basically saying “forget about word2vec, simply start with a random embedding layer (wired into RNN or whatever) and let it converge as part of the overall network training”?
Thanks!