Convolutional Subsampling vs Pooling Layers

Besides speed, is there really any advantage to using >1 subsampling during convolution as opposed to doing, for example, max pooling? In my mind, if you separate out pooling into it’s own operation, you not only reduce the dimension… but also have more control of how you want to do it, i.e. max, min, avg, etc. With convolutional subsampling, you purely reduce the dimension but without performing any operation. Is that right?

Do any of you know practical / real applicable examples where it makes sense to use one vs the other?

EDIT: For clarity, when I say pooling I mean conv followed by pooling.


Convolutions with stride>1 (which I believe is what you mean by conv subsampling) may work better for generative models (segmentation, style transfer, etc) since they can be better reversed.

For classification, combining max and average pooling seems to be best (although maxout may be even better), although I haven’t seen a thorough comparison vs subsampling.


A quick note from signal processing point of view. Strides correspond to downsampling. Max pooling looks like envelope extraction. Avg pool acts like low pass filtering and then downsampling, also called decimation. Pure down-sampling can introduce the so called aliasing (overlaping of the signal in the frequency domain) if the signal can change too fast (e.g. noise). In this case low pass filtering is needed. Envelope extraction is usefull when the signal has some high frequency oscillation (modulation) that we do not need and want to discard.

How all this plays with the deep learning I don’t know yet :slight_smile:


Epic response–thanks for the insight!

Thank you for the insight on stride vs avg pooling in signal processing context. In the case of audio super resolution, based on the insight here, it seems one would want to use avg pooling than strides. @samwit

It could be but it is possible that the filters before has learned the proper weights and already smoothed the signal. Putting avg pooling looks like hard-coded convolutional layer with strides.