I’m trying to train a word2vec model in a specific business domain. I’ve collected a relatively large corpus of domain-specific text, trained a model using
gensim's basic functionality and got decent results (in the sense that words that are semantically similar in my domain but not in widespread English are indeed more closely embedded when compared to the Google News model).
I’m trying to improve my model, and what I had in mind is using some sort of transfer learning. I thought about fine tuning google’s word2vec - extending its vocabulary and re-training on my domain-specific corpus.
My question is - can I somehow use the method introduced in the ULMFiT paper to fine tune a word2vec model? The two problems seem very similar to me, but I’m not sure how to do this… would appreciate any thoughts/ideas/references on this (the most similar question I’ve found in the forum is this)