Lesson three the image resizing trick

The image resizing trick that jeremy showed.How is it working in the backend as the learner he used to train the 128x128 satellite image will have different number of parameters when compared to a learner 256x256 so how are the weights transferred .In keras however we need to specify the input dimensions before hand thus this trick does not seem to work? Can anyone explain whats really going on and also how can i implement it in keras as i have a project with me and i want to improve the results.

In traditional CNNs people used to flatten the output of the last Convolutional layer directly and pass it through a linear layer. Lets say the size of our feature map from last convolutional layer is (batch_size, 512, 8, 8) (this is what you’ll get if you pass a 256*256 image through a ResNet34), so when you define network you need to make sure that the number of input features to that linear layer is (512*8*8).

On the other hand when you use a predefined resnet from torchvision, the output of the last convolutional layer is first passed through an AdaptiveAvgPool2d (refer the image above) layer which gives the same output for any sized input (in last two dimensions).

So to be clear if you pass in an image sized 256*256 the feature map shape will be (batch_size, 512, 8, 8), and if you pass image sized 128*128 the feature map size will be (batch_size, 512, 4, 4). When you pass these through the AdaptiveAvgPool2d layer both will be converted into (batch_size, 512, 1, 1) hence when you flatten it you’ll eventually get same sized tensor i.e. (batch_size, 512*1*1), and a linear layer defined with number of input channels to be 512 works for both the cases.

Check out the code below:

from fastai.vision import *

feature_map_256 = torch.randn(2, 512, 8, 8)
feature_map_128 = torch.randn(2, 512, 4, 4)

avg = nn.AdaptiveAvgPool2d(output_size=1)

print(f"feature map size after adaptive avg pooling (256*256 image size) {avg(feature_map_256).shape}")
print(f"feature map size after adaptive avg pooling (128*128 image size) {avg(feature_map_128).shape}")

flatten = Flatten()
linear = nn.Linear(512, 10)

Play with the code given above and you’ll understand. (Here I’m using batch size of two)

1 Like

Thanks alot great explanation.How would you implement this flexibility with keras any ideas?

https://stackoverflow.com/questions/52934764/keras-adaptive-max-pooling check out this link. It deals with that problem. There is no Adaptive Pooling layer instead there is something called Global pooling layer in keras.