At the end of lesson four @jeremy said we can pre-train with Wikipedia corpus and then fine tune it with IMDB language model and then, do sentiment analysis on top of that. My questions are:
What if some IMDB vocabs do not exist in Wikipedia vocab?
Why do not we add the IMDB comments to the Wikipedia corpus, and create a bigger corpus from the beginning, instead of first train with Wikipedia and then fine tune with IMDB?
When we want to try this text classification on some small data of our own, which we can not create a good language model with them, should we add our corpus to the existing big corpus (like Wikipedia or IMDB or sth else) or should we first train on big corpuses and then fine tune the language model with our dataset? What if some of our vocabs do not exist in the big corpus we used?