There is bias in language models. I’ve talked about it some here and here (including some approaches for dealing with it).
Can Jeremy mention what exact GPU he used for the IMBD notebook, especially considering the fact that many people run out of GPU memory running it using P4?
So is the wikitext103 base model somewhat superior to other models such as reddit because wikitext contains more english vocabulary ?
i believe it was a 1080Ti with 12GB
Is the idea of pre-trained language model similar to BERT that i came across while looking for language model.
What if you want to keep the unknown words (rather than replacing them with xxx)? As long as its used more than 1x, will it be kept?
The core idea is that even if wikitext is not that similar to the corpus you are interested in, you can still pre-train with wikitext, and then fine tune using the corpus you’re interested in.
Is fastai library available by default via Kaggle kernels? It seems they don’t allow to install custom packages into GPU kernels. There is Quora kernel-only competition so I wonder if it is possible to use the library here.
How to change the size of vocabulary? 6000 to 8000?
Can Jeremy confirm?
Please post these kind of questions in the Advanced Section. But yes, BERT is another example of transfer learning in NLP, using a different backbone (transformer) and solving a different task (masked language model + next sentence prediction).
Do we build the vocab again from scratch or we use the vocab from pre-trained wikitext model?
wouldn’t you be missing word if vocab is limited to a small number.?
Jeremy said the traditional approach is to convert everything to lower case. Then how do you predict case? Is there a separate model / technique for that?
you wouldn’t be able to use the built-in language models for that competition because external data is prohibited
We build it from scratch because we have new words that didn’t exist on wikipedia, then we match them.
What about other languages than English?
There is a token that specifies if the next word is caps. In test time, it works like any other token.
Already answered in this same thread.