Changing Image Size during training!

divyansh · January 28, 2018, 11:20am

Hi,

I am trying to implement changing image size during training in my own implementation(i.e not by fastai lib). But whenever I change the size it gives error of size mismatch.
I want to know how @jeremy implemented this functionality in the fast.ai library .

Thing I want to accomplish.
I want to train the same model with different batch sizes i.e (224X224), (150X150), (100,100), (64,64).

How can I do it for my implementation?

radek · January 28, 2018, 11:53am

You need to make the NN not care about input size at some point. Global pooling or fully convolutional NNs are two examples of that

Check out also the implementations of adaptive pooling layers in the fastai library.

I might be wrong on this one and can’t verify atm but I think any model for transfer learning from the library will give you that.

divyansh · January 28, 2018, 12:14pm

I thought the default models present in torchvision handled this for us! Am I wrong?

machinethink · January 28, 2018, 12:45pm

Some might but not all. The VGG model in torchvision expects the output of the convolutional layers to be 7 x 7 pixels tall and wide. This model expects a 224x224 image as input and scales it down 5 times. If you start with a 150x150 image, then 150/2^5 = 4 (rounded down), and the conv layers output an image of size 4 x 4. That does not match the size expected by the linear layers that follow the conv layers.

In general, only models that apply some form of adaptive pooling before the linear layers, or models that do not use linear layers at all, will accept input images of any size. And even then you can get into trouble, since for very small input images the pooling/resizing steps may end up giving you an image of 0 x 0 pixels, which will result in an error message.

niazangels · January 28, 2018, 1:27pm

I believe fully convolutional models are the answer to your question. Instead of using Dense layers/Fully connected layers, you can use 1x1 convolution to get the same effect.
Jeremy talks a bit about this in the course, and you can find a more detailed explanation in Andrew Ng’s course:

divyansh · January 28, 2018, 1:30pm

how was then jeremy able to train the model on 64 X 64 images and 200 X 200 in deep learning part1v2.

divyansh · January 28, 2018, 1:34pm

You mean I need first globalavgpool and then use 1X1 conv layer in replacement for FC layers.
But still at the last layer I will have to change the number of output units according to my number of classes.

Also like @machinethink said if image sizes are too low then the pool layer will make features at some point very small and error will occur.

machinethink · January 28, 2018, 2:00pm

He probably used a different architecture. If you have a link to where in the lesson this was shown, I can go into more detail.

jeremy · January 28, 2018, 2:29pm

@radek is exactly right. fastai automatically handles this for you.

divyansh · January 28, 2018, 2:34pm

This is where jeremy did that.

divyansh · January 28, 2018, 2:34pm

I think I should peek into the fast.ai library.

jeremy · January 28, 2018, 2:36pm

Yes looking at the source of fastai is highly recommended.

machinethink · January 28, 2018, 2:57pm

That uses resnet34, which does an average pooling (with a 7x7 kernel) before the FC layer.

[I’m not 100% sure what would happen, but my guess is that if you’d use a very large input image, then the output of the last conv layer would be greater than 7x7 and the output of the pooling is greater than 1 x 1 x channels, and you’d get an error again because now the linear layer receives too much data.]

jeremy · January 28, 2018, 3:56pm

Not exactly. As @radek said, it uses adaptive pooling (in the fastai library) to handle different image sizes automatically.

Nope no error!

machinethink · January 28, 2018, 7:34pm

Good to know! (I had only looked at the model definition in torchvision, which does not use adaptive pooling.)

Fluxionsdx · September 1, 2018, 6:39pm

You’d definitely get an error if you don’t take some measure (like adaptive pooling) to ensure that the output of the final conv layer remains the same regardless of input size. I was very confused as I had never heard of adaptive pooling before. This course is full of really interesting and useful nuggets of information, even if you have to deal with some confusion and forum searching to get to the bottom of things.