Newer Vision models, like Resnet, use global average pooling at the end before their dense layers.
Rather than a single global average pooling for each channel, I’d like to do a single 7x7 convolution of each channel (separately). This convolution can then figure out if the average/max/other pooling is best.
I know how to do a 7x7 convolution of all channels together (resulting in a huge number of weights). How to do this with each channel separately?