CNNs that works with any input size?

In one of the lessons, Jeremy mentioned that some of the new CNNs work with any input size of image. He said that older CNNs like AlexNet and VGG did not posses this feature, whereas Resnet has this feature. He also mentioned something like Fully Convolutional Network (FCN). As far as I know, FCNs helps to convert the fully connected layer into conv layer.

  1. How do FCNs help in working with any image size ?
  2. How do resnets work with any image size and VGG do not ?
1 Like

The problem is with the fully-connected layers at the end. The number of inputs of such a layer is fixed, say 25,088. That means the output from the layers before that must be a feature map with exactly 25,088 elements in it, for example a 7x7x512 feature map.

If the network scales down the input image by a factor of 32, then 7x32 = 224. Therefore, such a network expects a 224x224 image as input.

If you give it a larger image, such as 320x320 then the final feature map (before the FC layers) is 10x10x512 = 51,200 elements. And that does not fit with the FC layer, which only expects 25,088 elements.

An FCN does not have fully-connected layers, and so it does not matter how large or small that final feature map is. Usually there is a global pooling layer at the end that reduces it to 1x1x512 (or however many channels you have), and then a classification layer is added to that.

Note that using a 1x1 convolution on a 1x1xC feature map is actually the same thing as using a fully-connected layer with C inputs, so the FCN really uses the global pooling layer to make the output of the final feature map fit the fully-connected layer.


Awesome ! Got it.
Thanks :smiley: