Lesson 8 - Official topic

jcatanza · May 6, 2020, 2:56am

Right. But my question is would translating English directly to English also work?

ilovescience · May 6, 2020, 2:58am

But would any translation service actually do anything? I have never checked this actually, but I assume they will just give you back the same text.

ecdrid · May 6, 2020, 2:58am

Freeze the initial layers , especially the embedding’s one as they are quite heavy to start with!

giacomov · May 6, 2020, 2:58am

When training my language model, the GPU is only running at 20% even when I’m using large batch sizes (like 1024).

My language model has a sequence length of only 8, but that’s justified by the nature of my dataset (musical chords). Is this the reason why the GPU is so under-utilized?

jcatanza · May 6, 2020, 2:59am

Make sure the batch size is as large as can fit in GPU memory.

bwarner · May 6, 2020, 3:00am

Wouldn’t freezing the embeddings defeat the purpose of fine-tuning on a new corpus as you want the model to learn the new vocabulary of the new corpus?

ilovescience · May 6, 2020, 3:01am

You still have the last layers of the network that you are fine-tuning.

Also, I may be incorrect, but I thought Jeremy mentioned fastai automatically freezes the initial layers. I may be wrong though.

jcatanza · May 6, 2020, 3:02am

Since RNNs are free to generate an output sentence with different number of words than the input sentence, I was thinking it might be able to express a given input sentence in different words (?)

ilovescience · May 6, 2020, 3:03am

AFAIK translation models do not use RNNs. They would use a seq2seq or transformer-based architecture. I don’t think therefore this statement is necessarily valid.

jwuphysics · May 6, 2020, 3:04am

I’d guess that the vocabulary of a corpus is actually a fairly high-level representation of the semantic meaning. If so, then the low-level semantics and sentiments are captured in the frozen embedding layers, and the hope is that they are fairly universal. (Perhaps not so from English to genomic sequences or sheet music.)

jcatanza · May 6, 2020, 3:05am

Seq-to-seq models are also free to generate an output sentence with a different length than the input sentence.

sgugger · May 6, 2020, 3:06am

Please remember to use the non-beginner topic for non-beginner discussion, and please focus on questions about what Jeremy is talking about right now

Dina · May 6, 2020, 3:07am

Why give similar weights to each word (token)? What if the last token has more effect on the predicted token?

sgugger · May 6, 2020, 3:09am

We use the same weights for the input, not the same embeddings. Each different token gets its own embeddings.

tensoralex · May 6, 2020, 3:10am

victor.vargas · May 6, 2020, 3:10am

Is n in loop of the recurrent NN would map to the sequence of the DL? like if the size is 72 then it would loop 72 times?

tensoralex · May 6, 2020, 3:10am

Sorry just saw your note…

sgugger · May 6, 2020, 3:11am

Yes, exactly.

steef · May 6, 2020, 3:11am

GPT-2 did not predict ULMFiT

But it did have some believable alternatives.

Raymond-Wu · May 6, 2020, 3:11am

how were these generated?