Lesson 4 In-Class Discussion ✅

rachel · November 14, 2018, 2:57am

There is bias in language models. I’ve talked about it some here and here (including some approaches for dealing with it).

lesscomfortable · November 14, 2018, 2:57am

It exists

PegasusWithoutWinds · November 14, 2018, 2:58am

Can Jeremy mention what exact GPU he used for the IMBD notebook, especially considering the fact that many people run out of GPU memory running it using P4?

pbanavara · November 14, 2018, 2:58am

So is the wikitext103 base model somewhat superior to other models such as reddit because wikitext contains more english vocabulary ?

Interogativ · November 14, 2018, 2:58am

i believe it was a 1080Ti with 12GB

iyersathya · November 14, 2018, 2:59am

Is the idea of pre-trained language model similar to BERT that i came across while looking for language model.

whatrocks · November 14, 2018, 2:59am

What if you want to keep the unknown words (rather than replacing them with xxx)? As long as its used more than 1x, will it be kept?

rachel · November 14, 2018, 2:59am

The core idea is that even if wikitext is not that similar to the corpus you are interested in, you can still pre-train with wikitext, and then fine tune using the corpus you’re interested in.

devforfu · November 14, 2018, 2:59am

Is fastai library available by default via Kaggle kernels? It seems they don’t allow to install custom packages into GPU kernels. There is Quora kernel-only competition so I wonder if it is possible to use the library here.

Xiwang · November 14, 2018, 2:59am

How to change the size of vocabulary? 6000 to 8000?

PegasusWithoutWinds · November 14, 2018, 2:59am

Can Jeremy confirm?

lesscomfortable · November 14, 2018, 3:00am

Please post these kind of questions in the Advanced Section. But yes, BERT is another example of transfer learning in NLP, using a different backbone (transformer) and solving a different task (masked language model + next sentence prediction).

sandeepsign · November 14, 2018, 3:00am

Do we build the vocab again from scratch or we use the vocab from pre-trained wikitext model?

averma · November 14, 2018, 3:00am

wouldn’t you be missing word if vocab is limited to a small number.?

gpakosz · November 14, 2018, 3:00am

Jeremy said the traditional approach is to convert everything to lower case. Then how do you predict case? Is there a separate model / technique for that?

wdhorton · November 14, 2018, 3:01am

you wouldn’t be able to use the built-in language models for that competition because external data is prohibited

sgugger · November 14, 2018, 3:01am

We build it from scratch because we have new words that didn’t exist on wikipedia, then we match them.

melonkernel · November 14, 2018, 3:02am

What about other languages than English?

lesscomfortable · November 14, 2018, 3:02am

There is a token that specifies if the next word is caps. In test time, it works like any other token.

lesscomfortable · November 14, 2018, 3:02am

Already answered in this same thread.