Lesson 4 In-Class Discussion ✅

What if you want to keep the unknown words (rather than replacing them with xxx)? As long as its used more than 1x, will it be kept?

5 Likes

The core idea is that even if wikitext is not that similar to the corpus you are interested in, you can still pre-train with wikitext, and then fine tune using the corpus you’re interested in.

3 Likes

Is fastai library available by default via Kaggle kernels? It seems they don’t allow to install custom packages into GPU kernels. There is Quora kernel-only competition so I wonder if it is possible to use the library here.

How to change the size of vocabulary? 6000 to 8000?

Can Jeremy confirm?

Please post these kind of questions in the Advanced Section. But yes, BERT is another example of transfer learning in NLP, using a different backbone (transformer) and solving a different task (masked language model + next sentence prediction).

3 Likes

Do we build the vocab again from scratch or we use the vocab from pre-trained wikitext model?

4 Likes

wouldn’t you be missing word if vocab is limited to a small number.?

Jeremy said the traditional approach is to convert everything to lower case. Then how do you predict case? Is there a separate model / technique for that?

4 Likes

you wouldn’t be able to use the built-in language models for that competition because external data is prohibited

We build it from scratch because we have new words that didn’t exist on wikipedia, then we match them.

3 Likes

What about other languages than English?

2 Likes

There is a token that specifies if the next word is caps. In test time, it works like any other token.

2 Likes

Already answered in this same thread.

1 Like

The language model we use is publicly available, though. No?

Yes, correct, but I was thinking to use the lib instead of writing custom training loop and models =) Also, they provide some embeddings already.

Why not also train the language model on the unsupervised entries in the IMDB dataset?

If we don’t hold out a validation set, does that mean there is no possibility of overfitting when building the language model?

2 Likes

We do!

1 Like

How to expand the vocab to medical records from wiki text if using transfer learning? Assuming vocab only considers high frequency English words from Wikipedia

6 Likes