I am trying to implement changing image size during training in my own implementation(i.e not by fastai lib). But whenever I change the size it gives error of size mismatch.
I want to know how @jeremy implemented this functionality in the fast.ai library .
Thing I want to accomplish.
I want to train the same model with different batch sizes i.e (224X224), (150X150), (100,100), (64,64).
Some might but not all. The VGG model in torchvision expects the output of the convolutional layers to be 7 x 7 pixels tall and wide. This model expects a 224x224 image as input and scales it down 5 times. If you start with a 150x150 image, then 150/2^5 = 4 (rounded down), and the conv layers output an image of size 4 x 4. That does not match the size expected by the linear layers that follow the conv layers.
In general, only models that apply some form of adaptive pooling before the linear layers, or models that do not use linear layers at all, will accept input images of any size. And even then you can get into trouble, since for very small input images the pooling/resizing steps may end up giving you an image of 0 x 0 pixels, which will result in an error message.
I believe fully convolutional models are the answer to your question. Instead of using Dense layers/Fully connected layers, you can use 1x1 convolution to get the same effect.
Jeremy talks a bit about this in the course, and you can find a more detailed explanation in Andrew Ng’s course:
You mean I need first globalavgpool and then use 1X1 conv layer in replacement for FC layers.
But still at the last layer I will have to change the number of output units according to my number of classes.
Also like @machinethink said if image sizes are too low then the pool layer will make features at some point very small and error will occur.
That uses resnet34, which does an average pooling (with a 7x7 kernel) before the FC layer.
[I’m not 100% sure what would happen, but my guess is that if you’d use a very large input image, then the output of the last conv layer would be greater than 7x7 and the output of the pooling is greater than 1 x 1 x channels, and you’d get an error again because now the linear layer receives too much data.]
You’d definitely get an error if you don’t take some measure (like adaptive pooling) to ensure that the output of the final conv layer remains the same regardless of input size. I was very confused as I had never heard of adaptive pooling before. This course is full of really interesting and useful nuggets of information, even if you have to deal with some confusion and forum searching to get to the bottom of things.