Lesson 3 In-Class Discussion ✅

Hi i made this notebook to understand the transforms by showing them:

i have not found you how to append the “skew” transform to a list of transform so if you know how then i would like to see it


Assign your loss function to learn.loss_func .

Thanks for showing that! :slight_smile: Note that you can also see examples of all transforms in the docs:



@safekidda I am thinking that initially the training was done for 128 sized image and later when the size is increased and the weights are being used from previous model, it helps the new model to learn fast and on top of the previous model. But I would like to get response from other learned individuals to comment and clear the doubts.

1 Like

@joshfp I’d point out that the new channels would likely retain some of their spatial information, so at least some of the weights of the early layers will transfer.

If you are still underfitting, try a different LR, and reduce the dropout.

Note that you get 0.5 dropout by default.

1 Like

How does Cyclical Learning Rate compare to One cycle ? Previously they were being denoted by use_clr and use_clr_beta.

1 Like

How do we create our own path object given I have a path as a string?

Thanks. Yes, it used weights from previous model, but I’m questioning how they would be useful given the dimensions of the image have changed. If you think about it, the filters that we’re learned worked on small satellite images, so if everything suddenly got 4 times bigger how good would, say, an edge detector be? Only way I can see it working is if the original model had augmentation applied to work with zoomed images.

You are talking about frontier research. Be content with the LR finder, for now. It is a gigantic step forward with respect to any previous method.

Oh, I think the tool is fantastic. I’m just not content with my current understanding of how to interpret it, that’s all :wink:

When we first setup the model training, example:
learn = create_cnn(data, models.resnet50, metrics=[error_rate, accuracy])
and then call the fit_one_cycle
what is the learning rate used by the model? Is there a default learning rate somewhere?

I would love to see an analysis of how it changes over time - do let us know if you run some experiments!


Yes. Hit shift-tab after you type ( to see the defaults, or check the docs. Once you find it, tell us what you learn! :slight_smile:

1 Like

In this lesson 3, progressive resizing is much talked about. At around 2:05:05 in the video, Jeremy says that we have trained with some size, 128x128, and now we can take the weights of this model and put in a new learner created with 256x256 images.

Here is my question:

Isn’t the number of weightsi in the model dependable on the input size? If it is, how can the weights of a 128 fit a 256 model?

1 Like

The loss surface stays the same, but the optimal LR(s) does change over time (iterations) since the finder cannot see the whole surface. Just the small patch of it which it can see by running a brief fake training on the minibatch(es) it employs (and this is why it does not perform well if you bs is too small: it cannot even get to grab a decent local view). As you train, you move across the surface, this is the whole point of training. And as you move, the optimal LR changes.

See: https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html

Pay particulat attention as it talks about the averaged plot vs raw plot. It is one of the reasons for which you should never take the lowest loss (another is intrinsic (topologically): you want to stay wall away from the blow-up).

So, as a general rule of thumb: try and pick the point of maximum negative slope, as long as that point is reasonably away from the blowup and loss still decreases nicely around it.

Do tend to prefer higher LRs if possible: since they act as a regularization methods by themselves, they’ll help you in avoiding overfitting (see the relevant papers by the usual Smith about superconvergence).


No it’s not dependent on input size. We’ll learn why later in this course.


thanks Jeremy. It is default_lr = slice(3e-3) specified in fastai/fastai/basic_train.py :slight_smile:

1 Like

For the convolutional layers, the number of weights depends on the filters size, so as long you don’t change the filters size, the number of weights will stay the same. The input size only affects the numbers of activations of the convolutional layer.

Then, in order the different number of activations coming out from the convolutional layers doesn’t affect the linear layers, a neat trick is used, called Adaptive Pooling Layer. These pooling layers are similar to standard pooling layers (max or average pool), but convert any size to the specified target size (that’s why are called adaptive). In this way, the number of input of the linear layers is always the same no matter the image’s input size. You can check the adaptive pooling layers by running learn.model.


@safekidda, see @joshfp’s comment about “Adaptive Pooling”. The weights depend on the spatial structure in the data and not on the image size.