Pointwise Convolutions and Dimensionality Reduction

I’ve been reviewing a few papers on Depthwise Separable Convolutions and Inception architectures and they keep pointing to 1x1 convolutions (a.k.a. pointwise convolutions) being used for dimensionality reduction.

In some ways this is counterintuitive. You would expect that inserting a new layer into your model would create MORE parameters not less. However, I can see how this works if the newly inserted layer of pointwise convolutions has fewer filters then the layer that comes before it. In this case, you reduce the number of operations needed for inference and the number of parameters in the model.

Do pointwise convolutions only reduce the number of parameters in a model when they have fewer filters then the layers they follow or is there something else that I’m missing that reduces the dimensionality of the model when you add a pointwise convolutions layer to it?

Here are the papers I’ve read that discuss pointwise convolutions. If there’s a better resource please send it my way.

Thanks for you help!


Your understanding of the issue is spot on. Nothing fancy is going on in those 1x1 convolutions. Every “pixel” in the feature maps at the output of the 1x1 convolution is just a linear combination of the “pixels” at the same position in the input feature maps. The pic below illustrates this point (if M < N you get the dimensionality reduction)

One other thing about pointwise convolutions (mentioned in the Inception paper) is that they can be dual purpose, since by adding a non-linear activation function to them (like ReLU) you are also adding depth to your model.

(Found the picture from a japanese website via google : http://qiita.com/yu4u/items/34cd33b944d8bdca142d, apparently part of a short explanation of Xception & depth-wise separable convolution)


Thank you for confirming my understanding was correct. Much appreciated!