Besides speed, is there really any advantage to using >1 subsampling during convolution as opposed to doing, for example, max pooling? In my mind, if you separate out pooling into it's own operation, you not only reduce the dimension... but also have more control of how you want to do it, i.e. max, min, avg, etc. With convolutional subsampling, you purely reduce the dimension but without performing any operation. Is that right?
Do any of you know practical / real applicable examples where it makes sense to use one vs the other?
EDIT: For clarity, when I say pooling I mean conv followed by pooling.