You still have the last layers of the network that you are fine-tuning.
Also, I may be incorrect, but I thought Jeremy mentioned fastai automatically freezes the initial layers. I may be wrong though.
You still have the last layers of the network that you are fine-tuning.
Also, I may be incorrect, but I thought Jeremy mentioned fastai automatically freezes the initial layers. I may be wrong though.
Since RNNs are free to generate an output sentence with different number of words than the input sentence, I was thinking it might be able to express a given input sentence in different words (?)
AFAIK translation models do not use RNNs. They would use a seq2seq or transformer-based architecture. I don’t think therefore this statement is necessarily valid.
I’d guess that the vocabulary of a corpus is actually a fairly high-level representation of the semantic meaning. If so, then the low-level semantics and sentiments are captured in the frozen embedding layers, and the hope is that they are fairly universal. (Perhaps not so from English to genomic sequences or sheet music.)
Seq-to-seq models are also free to generate an output sentence with a different length than the input sentence.
Please remember to use the non-beginner topic for non-beginner discussion, and please focus on questions about what Jeremy is talking about right now
Why give similar weights to each word (token)? What if the last token has more effect on the predicted token?
We use the same weights for the input, not the same embeddings. Each different token gets its own embeddings.
Is n in loop of the recurrent NN would map to the sequence of the DL? like if the size is 72 then it would loop 72 times?
Sorry just saw your note…
Yes, exactly.
how were these generated?
Thanks
Is LMModel3
a 4 layer model because of h
? Or is a 3 layer model since there are 3 nn
objects?
So the truncated backprop is truncated every batch size?
It’s a one-layer recurrent model.
how many parameters an RNN end up having if we really only have one layer repeated multiple times?
Are we changing the parameters on the same layer at each loop, or creating a layer for each loop?
So does self.h
represent the one layer? Or is it self.h_h
?