Ideas behind adaptive max pooling

alwc · March 5, 2018, 9:24am

Is there a different name for adaptive max pooling? Where can I read more about it? Jeremy mentioned in lesson 7 that there was a paper written about it. Anyone know which paper he is referring to?

nishant_g · March 5, 2018, 9:47am

@alwc I think, Jeremy implies that he invented concatenation of Average and Max Pooling in the last layer. At the same time, someone else wrote a paper about it. And as far as the term “Adaptive” is concerned, it makes the following two lines of code equivalent:
AvgPool2d(kernel_size=7, stride=7, padding=0) //Here the three parameters ensure that the output activation has 1by1 dimension
nn.AdaptiveAvgPool2d(1) //Here we don’t specify the kernel_size, stride or padding. Instead, we specify the output dimension i.e 1by1

Anyways, I still don’t know which paper discovered that concatenation of Avg and Max Pooling is better than Standalone Avg or Max Pooling.

alwc · March 5, 2018, 10:04am

Hi @nishant_g, thanks for the prompt reply.

I’ve tested out in pytorch 0.3.1 and here are the results:

>>> x = torch.randn(1, 1, 8, 8)

>>> m1 = nn.AvgPool2d(kernel_size=7, stride=7, padding=0)
>>> i1 = Variable(x)
>>> o1 = m1(i1)
>>> o1.shape
torch.Size([1, 1, 1, 1])

>>> m2 = nn.AdaptiveAvgPool2d(1)
>>> i2 = Variable(x)
>>> o2 = m2(i1)
>>> o2.shape
torch.Size([1, 1, 1, 1])

>>> o1
Variable containing:
(0 ,0 ,.,.) = 
 -0.1110
[torch.FloatTensor of size 1x1x1x1]

>>> o2
Variable containing:
(0 ,0 ,.,.) = 
1.00000e-02 *
   1.3406
[torch.FloatTensor of size 1x1x1x1]

Although the shapes are equivalent, the outputs are not the same. Did I do anything wrong?

p.s. if x = torch.randn(1, 1, 7, 7), o1 will equal to o2.

machinethink · March 5, 2018, 10:10am

The other name for it is “global pooling”, although they are not 100% the same. With global avg/max pooling the size of the resulting feature map is 1x1xchannels. With adaptive pooling, you can reduce it to any feature map size you want, although in practice we often choose size 1, in which case it does the same thing as global pooling.

What happens is exactly the same as with regular average or max pooling, but instead of specifying the size of the pooling window (e.g. 2x2) you specify the size of the output feature map that you want (and the size of the pooling window is automatically computed from that).

nishant_g · March 5, 2018, 10:51am

Just change this line to x = torch.randn(1,1,7,7)
You will get same output for both conditions
The reason is: 7by7 filter on 7by7 activation fits completely, whereas 1 row or column is left out when the same 7by7 filter is applied on 8by8 activation

alwc · March 5, 2018, 10:54am

Thanks @nishant_g, I’ve figured that out after a while just now!

youngpm · May 25, 2018, 10:12pm

I think this is the paper:

https://arxiv.org/abs/1406.4729

rohun_t · October 21, 2018, 10:43pm

Its curious that it didn’t through an error when nn.AvgPool2d was applied on an 8x8 filter