Use this thread for questions/discussion of today’s lesson. Please do not use this for questions/comments about topics we haven’t covered yet - use the Lesson 5 further discussion thread for that. Also remember to watch the official updates thread.
What is the name of the article Jeremy is referencing?
Small heads-up questions probably. However, could you please clarify if this plot shows learning rate changing during a single epoch, or during the whole training process? I am expecting that it is the former.
Can Jeremy please write out that formula he keeps saying?
Jeremy is going to explain that plot later during the lesson. Please wait for him going over it
parameter = parameter - learning_rate * parameter.grad
question to the folks here,I just forgot from previous lesson, what are the blue box’d activations and purple box’d activation? Why there was two boxes drawn for activation layers initially.
Blue boxes are before the activation function, purple box is after the activation function.
When we load a pre-trained model, can we explore the activation grids to see see what they might be good at recognizing? How can we generate those images?
What would we do if we have a very high number of images to classify say in the order of 100000 classes?
Is data.classes different from data.c?
How we initialize random weights?
data.c is the length of data.classes in general.
There are several articles about that. The basic initialization is to use a normal distribution with a standard deviation depending on the number of input channels.
Discriminative layer training: https://docs.fast.ai/basic_train.html#Discriminative-layer-training
So when we first use transfer learning and train - we are only training the random layers we placed on top of the model, or are we also training top few layers of the resnet?
No, only the new layers initialized randomly. As long as you haven’t typed learn.unfreeze().
Maybe a dumb question but why divided by 3?