Lesson 4 In-Class Discussion βœ…

Does anyone know how to change the size of vocabulary? The default one as shown is 60,000. Can we change that number?

You can set those in both approaches, factory methods have them in kwargs.

What is the loss measuring in the language model when it is trying to predict the next word?

2 Likes

With training error 3.84 and validation error 3.97, Jeremy declares that it is still under-fitting. We have mentioned that when training error is larger than validation error, it is under-fitting. However, how do we tell here that it is still under-fitting, as the train error is lower than validation error?

There is no normalization step here because all the input data is in plain text format?

Backwards? You mean bidirectional?

Thanks. Could you please explain a bit more how it is possible? I assume we fix the vocab and generate the embeddings using their imdexes.

Yes sir. A pre-trained model that is trained against wiki103 in the reverse direction (the one we’re looking at it trained with a forward pass through the documents).

3 Likes

This for the advanced topic, but the short answer is that new words will have random embeddings at first, but the model will learn them after.

2 Likes

Sweet just answered my question

What happens to new words in imdb, will it get added to original language model?

1 Like

No because they will share the same vocab

got an error. __init__() got an unexpected keyword argument 'max_vocab'

But I guess max_vocab was from TextDataset and TextLMDataBunch inherits from TextDataBunch.

Could you point how to set that in Datablock api? also tokenizer

1 Like

How to add a custom tokenizer with datablock API?

4 Likes

What else can we override when we load a saved DataBunch … anything besides batch size?

data_clas = TextClasDataBunch.load(path, 'tmp_clas', bs=50)

You have to look at what is now the preprocessor.

Is it possible to do regression using pre-trained model instead of just classification? How would the databunch look like for that?

What do you recommend for tokenizing/segmenting CJK languages? Isn’t that complicated?

1 Like

Same, look at the processor argument.

1 Like

It is believed that a batch size that is a power of 2 is more effective. However, it also makes sense that a larger batch size that utilizes the GPU better also makes training faster. So, how much exactly does having a batch size that is not a power of 2 hurt? For example, how does bs=32 and bs=33 compare in training speed?

4 Likes