Lesson 1 In-Class Discussion ✅

All requests for installation/setup help should go to the appropriate topic listed here:

Default batch size for training is 64 while evaluating is 64*2.

However given the traffic that this thread sees during the lecture it takes a while just to scroll down to the bottom of the ever-developing thread. A quicker way to post a new reply (instead of replying to any post in particular) is to click on the Reply button available in the scroll bar on the right-side of the page at all times.

2 Likes

Why is the lr_find call seemingly aborting at about 65% and jupyter displaying a red progress bar? See this part in the video.

I think Jeremy will explane that later classes but idea behind lr_find is to increase learning rate every time until it become too big. When it becomes too big fastai stop looping because we already got all we wanted.

2 Likes

They get stored here:
~/.fastai/data/oxford-iiit-pet/images/models/

So all data gets downloaded to this “new” .fastai dir in your home directory into the data folder and the saved model states get stored there within the respective subdirs.

2 Likes

When the error becomes too high (due to increased learning rate) the lr_find() stops. You can control the limits to where it stops from parameters of the function (used to be a clip value, but check documentation).

@sgugger, Is there an equivalent of keras

x = BatchNormalization()(x, training=False)

There has been a lot of hand wringing that Keras also updates BatchNorm parameters even when trainable is set to false, ie layer is frozen. In fact if I remember right, in one of @Jeremy’s tweets, he had pointed this exact issue as a problem with Keras. Any intuition as to why the performance improves with BN layer unfrozen. Seems counter intuitive, especially since the following layers cannot adapt to change in BatchNorm when they are frozen.

But then again what is intuitive and counter-intuitive seems all warped. As Jeremy recently tweeted about an approach where the head’s weights are frozen at random initialization and the base model’s weights are updated. :wink: completely topsy turvy

2 Likes

1e-6 is the learning rate for the very first layers (the layers which Jeremy showed which had those edges and gradients in it) and then we have the Learning rate 1e-4 for the above layers. We have actually unfrozen the complete net ( meaning we are now training the weights of the net from the very first layers)

So the basic concept behind this is we dont want to change the weights of the previous layers (very first layers) which are detecting edges, gradients etc because they are really good at it, hence a very small learning rate (which means that we wont be changing their weights to a great extent) but the layers after that where we now try to recognize patters, eyeballs etc is where we would like to change the weights as that is where we want the net to recognize patterns which are related to our dataset, hence greater learning rate than the previous one. This greater learning rate gives those layers a better chance to understand our dataset and change the weights accordingly.

The python slice(start, stop) method gives you the ability to apply these learning rates to the spread of layers beginning with the small learning rate for first layers and increasing it as we move forward with the layers in the network.

Hope this clears your doubt

7 Likes

@jeremy talked about a post, which would tell us how to generate our own datasets. I cannot find that anywhere. Does anyone know where to find that?

3 Likes

Does that mean that the results of the ConvLearner function for epoch 4 when called with 5 epochs is same as when called with 4 epochs?

Imagebunch using folder,CSV,list has already been discussed, but what can we do if we have a folder of multiple CSVs, just like quickdoodle recognition on kaggle right now ? Up till now I am working with generators.

Is data normalize always necessary?
doesn’t batch normalization layer (if added to a network) do same task?

BN is typically not done until after your first linear layer, right?

Batch normalization normalizes activations and not the input data.

Yes it is absolutely necessary. Without it, the range of pixel values will be between 0-255 and running pretrained networks with these values will give very bad results as these networks were trained on normalized images.
If ur training ur own network without transfer learning(i.e.randomly initialized weights), then this is not a mandatory thing to do as batch norm layers will take care of it in later stages but still a good thing to have to help smoother optimization

1 Like

i dint look at the quickdoodle competition data but combining the multiple CSVs to get a single csv is an option

I have written a brief blog post based on my notes for lesson 1.

Please have a look.

4 Likes

He was using AWS SageMaker during the session.

You might want to get permission before sharing link to the course site

4 Likes