I’ve been reviewing a few papers on Depthwise Separable Convolutions and Inception architectures and they keep pointing to 1x1 convolutions (a.k.a. pointwise convolutions) being used for dimensionality reduction.
In some ways this is counterintuitive. You would expect that inserting a new layer into your model would create MORE parameters not less. However, I can see how this works if the newly inserted layer of pointwise convolutions has fewer filters then the layer that comes before it. In this case, you reduce the number of operations needed for inference and the number of parameters in the model.
Do pointwise convolutions only reduce the number of parameters in a model when they have fewer filters then the layers they follow or is there something else that I’m missing that reduces the dimensionality of the model when you add a pointwise convolutions layer to it?
Here are the papers I’ve read that discuss pointwise convolutions. If there’s a better resource please send it my way.
Thanks for you help!