On lesson 11 Jeremy says “Nearly all person I talked to think Pytorch CNNs has a fundamental limitation that they are tied to the input size because of Maxpool”. And Jeremy argues that it’s not true since VGG. I wonder how it is related? I understand replacing Maxpool by AveragePool could allow any input size (still I’m not sure about the technical details) but why is Jeremy saying “it’s not true since VGG”?
It’s more like adaptive pooling vs. ordinary pooling; the pooling type can be arbitrary. Those people think that you have to specify the input size as a constant in the code. Let’s say you have a particular code without adaptive pooling, and it was originally designed to work with 224x224 images. Somewhere near the head of the net, you’ve done your last convolution/activation/whatever, you have 7x7 filters in the output, and now you need to get a plain weight vector for a Linear layer. Without adaptive pooling you have to specify the input size of the pooling layer, which depends on the size of original input images (for that fixed architecture). Later, if you want to work with images of different size, you’ll need to change the size of 7x7 filters in the code because it won’t work otherwise.
There is a thread on the pytorch forums that I think might be relevant
BTW the solution at the end of the thread is quite nice.
Just use pytorch’s adapative pooling functions - no need for any workarounds