Changing Image Size during training!


I am trying to implement changing image size during training in my own implementation(i.e not by fastai lib). But whenever I change the size it gives error of size mismatch.
I want to know how @jeremy implemented this functionality in the library .

Thing I want to accomplish.
I want to train the same model with different batch sizes i.e (224X224), (150X150), (100,100), (64,64).

How can I do it for my implementation?

1 Like

You need to make the NN not care about input size at some point. Global pooling or fully convolutional NNs are two examples of that :slight_smile:

Check out also the implementations of adaptive pooling layers in the fastai library.

I might be wrong on this one and can’t verify atm but I think any model for transfer learning from the library will give you that.


I thought the default models present in torchvision handled this for us! Am I wrong?

Some might but not all. The VGG model in torchvision expects the output of the convolutional layers to be 7 x 7 pixels tall and wide. This model expects a 224x224 image as input and scales it down 5 times. If you start with a 150x150 image, then 150/2^5 = 4 (rounded down), and the conv layers output an image of size 4 x 4. That does not match the size expected by the linear layers that follow the conv layers.

In general, only models that apply some form of adaptive pooling before the linear layers, or models that do not use linear layers at all, will accept input images of any size. And even then you can get into trouble, since for very small input images the pooling/resizing steps may end up giving you an image of 0 x 0 pixels, which will result in an error message.


I believe fully convolutional models are the answer to your question. Instead of using Dense layers/Fully connected layers, you can use 1x1 convolution to get the same effect.
Jeremy talks a bit about this in the course, and you can find a more detailed explanation in Andrew Ng’s course:

how was then jeremy able to train the model on 64 X 64 images and 200 X 200 in deep learning part1v2.

You mean I need first globalavgpool and then use 1X1 conv layer in replacement for FC layers.
But still at the last layer I will have to change the number of output units according to my number of classes.

Also like @machinethink said if image sizes are too low then the pool layer will make features at some point very small and error will occur.

He probably used a different architecture. If you have a link to where in the lesson this was shown, I can go into more detail.

@radek is exactly right. fastai automatically handles this for you.

This is where jeremy did that.

I think I should peek into the library.

Yes looking at the source of fastai is highly recommended.

That uses resnet34, which does an average pooling (with a 7x7 kernel) before the FC layer.

[I’m not 100% sure what would happen, but my guess is that if you’d use a very large input image, then the output of the last conv layer would be greater than 7x7 and the output of the pooling is greater than 1 x 1 x channels, and you’d get an error again because now the linear layer receives too much data.]

Not exactly. As @radek said, it uses adaptive pooling (in the fastai library) to handle different image sizes automatically.

Nope no error! :slight_smile:


Good to know! :smile: (I had only looked at the model definition in torchvision, which does not use adaptive pooling.)

You’d definitely get an error if you don’t take some measure (like adaptive pooling) to ensure that the output of the final conv layer remains the same regardless of input size. I was very confused as I had never heard of adaptive pooling before. This course is full of really interesting and useful nuggets of information, even if you have to deal with some confusion and forum searching to get to the bottom of things.