Beyond Global Average Pooling - How to pre-processing each channel across locations

Newer Vision models, like Resnet, use global average pooling at the end before their dense layers.

Rather than a single global average pooling for each channel, I’d like to do a single 7x7 convolution of each channel (separately). This convolution can then figure out if the average/max/other pooling is best.

I know how to do a 7x7 convolution of all channels together (resulting in a huge number of weights). How to do this with each channel separately?

Depending on what you mean precisely, you might want to look into depthwise separable convolutions.

There’s an implementation in Keras as part of the MobileNet application.

Awesome - thanks!

Just to clarify: a depthwise separable convolution is the combination of a depthwise convolution followed by a piecewise (or 1x1) convolution. What you’re looking for is just the first part, depthwise convolution, not the 1x1 convolution that follows it.