Lesson 5 In-Class Discussion ✅

jeremy · November 19, 2018, 10:43pm

Use this thread for questions/discussion of today’s lesson. Please do not use this for questions/comments about topics we haven’t covered yet - use the Lesson 5 further discussion thread for that. Also remember to watch the official updates thread.

aidan.davis · November 20, 2018, 2:35am

What is the name of the article Jeremy is referencing?

lesscomfortable · November 20, 2018, 2:35am

devforfu · November 20, 2018, 2:41am

Small heads-up questions probably. However, could you please clarify if this plot shows learning rate changing during a single epoch, or during the whole training process? I am expecting that it is the former.

whatrocks · November 20, 2018, 2:41am

Can Jeremy please write out that formula he keeps saying?

sgugger · November 20, 2018, 2:42am

Jeremy is going to explain that plot later during the lesson. Please wait for him going over it

sgugger · November 20, 2018, 2:42am

parameter = parameter - learning_rate * parameter.grad

RajeshMappu · November 20, 2018, 2:42am

question to the folks here,I just forgot from previous lesson, what are the blue box’d activations and purple box’d activation? Why there was two boxes drawn for activation layers initially.

sgugger · November 20, 2018, 2:43am

Blue boxes are before the activation function, purple box is after the activation function.

whatrocks · November 20, 2018, 2:47am

When we load a pre-trained model, can we explore the activation grids to see see what they might be good at recognizing? How can we generate those images?

sethkunal · November 20, 2018, 2:47am

What would we do if we have a very high number of images to classify say in the order of 100000 classes?

YJP · November 20, 2018, 2:48am

Is data.classes different from data.c?

jyoti3 · November 20, 2018, 2:48am

How we initialize random weights?

sgugger · November 20, 2018, 2:48am

data.c is the length of data.classes in general.

sgugger · November 20, 2018, 2:50am

There are several articles about that. The basic initialization is to use a normal distribution with a standard deviation depending on the number of input channels.

cedric · November 20, 2018, 2:50am

Discriminative layer training: https://docs.fast.ai/basic_train.html#Discriminative-layer-training

agoldina · November 20, 2018, 2:50am

So when we first use transfer learning and train - we are only training the random layers we placed on top of the model, or are we also training top few layers of the resnet?

sgugger · November 20, 2018, 2:51am

No, only the new layers initialized randomly. As long as you haven’t typed learn.unfreeze().

AlexisGallagher · November 20, 2018, 2:51am

Maybe a dumb question but why divided by 3?