Lesson 4 In-Class Discussion ✅

is there bias in language models? such as gender, race in embeddings? how to deal w it?

5 Likes

I will check about this next time.

There are many algorithms - statistical as well as neural (LSTMs) that can be used to make language models.

1 Like

Does the wiki-text model and the fine-tuned version share the same vocabulary?

As a matter of fact yes! Check the language model zoo topic :wink:

9 Likes

I can hear you clearly.

1 Like

Curious to see if a similar model for Russian exists. Articles on Wikipedia in other languages can be much more limited in quantity and length.

yes, always. there are debias techniques…

2 Likes

can the transfer learning of wiki text can be used different domains which wiki text does not have any context ?

There is bias in language models. I’ve talked about it some here and here (including some approaches for dealing with it).

13 Likes

It exists

1 Like

Can Jeremy mention what exact GPU he used for the IMBD notebook, especially considering the fact that many people run out of GPU memory running it using P4?

1 Like

So is the wikitext103 base model somewhat superior to other models such as reddit because wikitext contains more english vocabulary ?

3 Likes

i believe it was a 1080Ti with 12GB

Is the idea of pre-trained language model similar to BERT that i came across while looking for language model.

7 Likes

What if you want to keep the unknown words (rather than replacing them with xxx)? As long as its used more than 1x, will it be kept?

5 Likes

The core idea is that even if wikitext is not that similar to the corpus you are interested in, you can still pre-train with wikitext, and then fine tune using the corpus you’re interested in.

3 Likes

Is fastai library available by default via Kaggle kernels? It seems they don’t allow to install custom packages into GPU kernels. There is Quora kernel-only competition so I wonder if it is possible to use the library here.

How to change the size of vocabulary? 6000 to 8000?

Can Jeremy confirm?