Rearranging Convolutions as Matrix Products

I’m currently working through lesson 15 of the fastai course, and have finished the portion about convolutions. Here, I explain how the convoluton operation can be rearranged, or unrolled, as a matrix dot product.

CNNs differ from standard NNs in that rather than having a given node, or neuron, have a weight for each and every input, the node contains only a small set of weights that is repeatedly applied to each and every input. In CNNs, the node, or neuron, is instead known as the kernel.

Say we have the kernel, or node, \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and the input tensor \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline 4 & 5 & 6 \\ \hline 7 & 8 & 9 \\ \hline \end{array} .

Convolving the these two together would involve first convolving \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and \begin{array}{|c|c|} \hline 1 & 2 \\ \hline 4 & 5 \\ \hline \end{array} . To convolve, simply sum all products of the respective elements in each matrix: 9 \cdot 1 + 8 \cdot 2 + 7 \cdot 4 + 6 \cdot 5 = 83.

The next convolution involves \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and \begin{array}{|c|c|} \hline 2 & 3 \\ \hline 5 & 6 \\ \hline \end{array} , then \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and \begin{array}{|c|c|} \hline 4 & 5 \\ \hline 7 & 8 \\ \hline \end{array} , and so on.

The resulting activation map in this case would be \begin{array}{|c|c|} \hline 83 & 113 \\ \hline 173 & 203 \\ \hline \end{array} .

However, programmatically, the approach we just performed involves the extensive use of for loops, which are quite slow and inefficient. If we pay close attention to the convolutions performed above, we can see that we are effectively repeatedly taking products and summing–which is exactly what a dot product is.

Therefore, it is possible to rewrite the convolution operation as a matrix dot product, which is significantly quicker to compute, as also demonstrated in this 2006 paper.

If we look again at the convolution operations above, \begin{array}{|c|c|} \hline 1 & 1 \\ \hline 1 & 1 \\ \hline \end{array} * \begin{array}{|c|c|} \hline 1 & 2 \\ \hline 4 & 5 \\ \hline \end{array} can be rewritten as \begin{bmatrix} 1 & 1 & 1 & 1 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 2 \\ 4 \\ 5 \end{bmatrix} or \begin{bmatrix} 1 \\ 2 \\ 4 \\ 5 \end{bmatrix} \cdot \begin{bmatrix} 1 & 1 & 1 & 1 \end{bmatrix}. This latter notation more closely resembles the typical linear combination notation in neural networks: \text{inputs}@\text{kernel} = \text{inputs}@\text{weights} = x@w = x \cdot w^T.

If we go back to our original kernel and input tensors, \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline 4 & 5 & 6 \\ \hline 7 & 8 & 9 \\ \hline \end{array}, we can rearrange these matrices like this.

\begin{bmatrix} 1 & 2 & 4 & 5 \\ 2 & 3 & 5 & 6 \\ 4 & 5 & 7 & 8 \\ 5 & 6 & 8 & 9 \end{bmatrix} \cdot \begin{bmatrix} 9 \\ 8 \\ 7 \\ 6 \end{bmatrix}

If we have multiple kernels, or multiple nodes, then it’s as simple as adding another column to the second matrix.

\begin{bmatrix} 1 & 2 & 4 & 5 \\ 2 & 3 & 5 & 6 \\ 4 & 5 & 7 & 8 \\ 5 & 6 & 8 & 9 \end{bmatrix} \cdot \begin{bmatrix} 9 & 2 \\ 8 & 0 \\ 7 & 2 \\ 6 & 0 \end{bmatrix}

And now the process should begin to feel familiar. In standard neural networks, the second matrix contains the weights of each node in a given layer. In this case, the second matrix contains the weights of each kernel in a given layer.


If you have any comments, questions, suggestions, feedback, criticisms, or corrections, please do post them down in the comment section below!

Great explanation! You’ve clearly grasped the concept of convolutions in CNNs and how they can be reformulated as matrix dot products. This approach can indeed lead to more efficient computations by leveraging the power of matrix operations.

Here are a few thoughts and suggestions:

Detailed Step-by-Step: You’ve done a fantastic job breaking down the convolutions step-by-step. For others who may be reading, consider adding visual aids or diagrams to illustrate each step of the convolution process. This can help make the concepts even clearer.

Mention of Padding and Strides: You might want to mention the role of padding and strides in convolutions. These are important aspects of CNNs that can affect the output size and computation time.

Performance Benefits: Highlight the performance benefits of using matrix dot products over traditional for-loop-based convolutions. You could include a brief comparison or a reference to the computational complexity.

Real-World Applications: Adding a section on real-world applications of CNNs can provide context and show the importance of these operations. For example, CNNs are widely used in image recognition, natural language processing, and more.

You’re on the right track, and your explanation is already quite comprehensive. Keep up the good work.

Best regards,
Teresa
FREE Gift from Marlboro