Lesson 8 - Official topic

You still have the last layers of the network that you are fine-tuning.

Also, I may be incorrect, but I thought Jeremy mentioned fastai automatically freezes the initial layers. I may be wrong though.

Since RNNs are free to generate an output sentence with different number of words than the input sentence, I was thinking it might be able to express a given input sentence in different words (?)

AFAIK translation models do not use RNNs. They would use a seq2seq or transformer-based architecture. I don’t think therefore this statement is necessarily valid.

I’d guess that the vocabulary of a corpus is actually a fairly high-level representation of the semantic meaning. If so, then the low-level semantics and sentiments are captured in the frozen embedding layers, and the hope is that they are fairly universal. (Perhaps not so from English to genomic sequences or sheet music.)

Seq-to-seq models are also free to generate an output sentence with a different length than the input sentence.

1 Like

Please remember to use the non-beginner topic for non-beginner discussion, and please focus on questions about what Jeremy is talking about right now :wink:

5 Likes

Why give similar weights to each word (token)? What if the last token has more effect on the predicted token?

We use the same weights for the input, not the same embeddings. Each different token gets its own embeddings.

1 Like

image

4 Likes

Is n in loop of the recurrent NN would map to the sequence of the DL? like if the size is 72 then it would loop 72 times?

1 Like

Sorry just saw your note…

Yes, exactly.

1 Like

GPT-2 did not predict ULMFiT :wink:

But it did have some believable alternatives.

6 Likes

how were these generated?

Thanks :slight_smile:

Is LMModel3 a 4 layer model because of h? Or is a 3 layer model since there are 3 nn objects?

So the truncated backprop is truncated every batch size?

It’s a one-layer recurrent model.

1 Like

how many parameters an RNN end up having if we really only have one layer repeated multiple times?
Are we changing the parameters on the same layer at each loop, or creating a layer for each loop?

So does self.h represent the one layer? Or is it self.h_h?