Newer Vision models, like Resnet, use global average pooling at the end before their dense layers.
Rather than a single global average pooling for each channel, I’d like to do a single 7x7 convolution of each channel (separately). This convolution can then figure out if the average/max/other pooling is best.
I know how to do a 7x7 convolution of all channels together (resulting in a huge number of weights). How to do this with each channel separately?
Just to clarify: a depthwise separable convolution is the combination of a depthwise convolution followed by a piecewise (or 1x1) convolution. What you’re looking for is just the first part, depthwise convolution, not the 1x1 convolution that follows it.