Does anyone know of any efforts being made to automate the pooling process by incorporating it into SGD? To expand upon this, I’ll use average pooling as an example.

Let’s say we’re taking a 2x2 section and using average pooling. One way to think of this is using a 0.25 weight on each of the 4 pixels and combining them. Maybe these weights could be optimized though through SGD? Has anyone heard of any efforts regarding this? This way, the network could learn what pooling solution works best for different depths throughout the network.

I’m not aware of anyone doing this (but I’m not an expert).

However…

Using convolution with a stride > 1 is kinda similar to this idea. The stride causes downsampling, while the convolution uses learned weights.

I guess what you’re describing is really depthwise convolution with stride > 1, not regular convolution. In other words, you’re not changing the number of channels.

Or maybe you want to learn different weights for each 2x2 block of pixels? (In other words, one weight for each pixel. With convolution it only learns a single set of 2x2 weights.) In that case you could use a “locally-connected” convolution layer with stride > 2.