Lesson 3 - input size difference for pre-trained

In lesson 3’s planet notebook, Jeremy mentioned a way to resize the original images after training for some time to make use of the trained model like a pre-trained network, and then training further to improve results.

However, when the image is resized, how does the pre-trained model adjust for the different input size in the first layer? Looked at the source code for create_cnn but still don’t really understand how the first layer adapts to the change in input size.

Thanks in advance!

4 Likes

@jeremy please clear this doubt. Also here for the same question.
Thanks

I think fastai handles this for you through something called adaptive pooling that pytorch has.

Some intuition is when you think about the convolution operation (e.g. of a 2x2 filter), when the image size increases (e.g from 128x128 to 256x256) it simply has more strides to take.

The output of that conv layer then has more elements than before. What I understand is that adaptive pooling will cater for this so that the ‘size requirement’ of images is not really an artificial constraint.

FastAI conveniently bakes in this concept.

I think this is what powers it - https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveAvgPool2d

6 Likes

@adi_pradhan Thanks for that answer, it makes sense. Do you know where the “magic” happens in the source code? So far, I’ve been unsuccessful in finding where fastai calls the adaptive pooling.

1 Like

An answer explained in another thread looks more reasonable to me.

Check this out as another reference.
https://forums.fast.ai/t/cnns-that-works-with-any-input-size/21415

1 Like

In addition, I found that there is an adaptive pooling layer at the end of ResNet model I believe is the key component that makes it adaptable to different size of input data.

No matter what size of input come through to the last sequential, these layers do the average pooling and maximum pooling to it and then concatenate them into a (1 x 1 x 1024) tensor. So it can follow the rest layers to generate output.

3 Likes