As the title suggests, can someone please explain how language model and word embeddings (specially ones that are generated by word2vec) are different? Are they the same? If they are not, how are they different? Where would you use one vs the other?
From Jermey’s and Sebastian’s blog post:
Very simple transfer learning using just a single layer of weights (known as embeddings) has been extremely popular for some years, such as the word2vec embeddings from Google. However, full neural networks in practice contain many layers, so only using transfer learning for a single layer was clearly just scratching the surface of what’s possible.
A language model is an NLP model which learns to predict the next word in a sentence. For instance, if your mobile phone keyboard guesses what word you are going to want to type next, then it’s using a language model.
And from this Quora question:
It is probably not fair to compare word2vec with an n-gram language model because in a language model word order matters whereas word2vec model does not consider word order at all ( all it does during training is to predict neighboring words within a window regardless of their order)
- Word2vec output, which is just word vectors, captures semantic similarity between words.
- Word2vec and language models are almost complementary in a sense.
A language model can benefit, from a word2vec model output - that is a language model that uses word vectors generated from a word2vec model may outperform a language model that just instantiates its words randomly before training.
Is it fair to say then that Word2Vec is a type of language model?
My understanding is the purpose of Word2Vec is to represent words by dense vectors of real-valued numbers. And from Jermey’s explanation the purpose of a language model is in a way to generate text. But the by-product is that the lower level (or levels?) weight matrices can be considered as the embeddings of the words. If that is so, what is the difference between Word2Vec and language model then. I’m still trying to wrap my head around this because I think this is a fundamental concept that I’m not understanding.
This has been discussed at great length in the part2- lesson 10 thread.
I’m still working through the videos. Is this addressed in the lesson 10 video as well?
Yes! Discussed multiple times in the video.