is there bias in language models? such as gender, race in embeddings? how to deal w it?
I will check about this next time.
There are many algorithms - statistical as well as neural (LSTMs) that can be used to make language models.
Does the wiki-text model and the fine-tuned version share the same vocabulary?
I can hear you clearly.
Curious to see if a similar model for Russian exists. Articles on Wikipedia in other languages can be much more limited in quantity and length.
yes, always. there are debias techniques…
can the transfer learning of wiki text can be used different domains which wiki text does not have any context ?
There is bias in language models. I’ve talked about it some here and here (including some approaches for dealing with it).
Can Jeremy mention what exact GPU he used for the IMBD notebook, especially considering the fact that many people run out of GPU memory running it using P4?
So is the wikitext103 base model somewhat superior to other models such as reddit because wikitext contains more english vocabulary ?
i believe it was a 1080Ti with 12GB
Is the idea of pre-trained language model similar to BERT that i came across while looking for language model.
What if you want to keep the unknown words (rather than replacing them with xxx)? As long as its used more than 1x, will it be kept?
The core idea is that even if wikitext is not that similar to the corpus you are interested in, you can still pre-train with wikitext, and then fine tune using the corpus you’re interested in.
Is fastai library available by default via Kaggle kernels? It seems they don’t allow to install custom packages into GPU kernels. There is Quora kernel-only competition so I wonder if it is possible to use the library here.
How to change the size of vocabulary? 6000 to 8000?
Can Jeremy confirm?