I’m currently working through lesson 15 of the fastai course, and have finished the portion about convolutions. Here, I explain how the convoluton operation can be rearranged, or unrolled, as a matrix dot product.
CNNs differ from standard NNs in that rather than having a given node, or neuron, have a weight for each and every input, the node contains only a small set of weights that is repeatedly applied to each and every input. In CNNs, the node, or neuron, is instead known as the kernel.
Say we have the kernel, or node, \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and the input tensor \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline 4 & 5 & 6 \\ \hline 7 & 8 & 9 \\ \hline \end{array} .
Convolving the these two together would involve first convolving \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and \begin{array}{|c|c|} \hline 1 & 2 \\ \hline 4 & 5 \\ \hline \end{array} . To convolve, simply sum all products of the respective elements in each matrix: 9 \cdot 1 + 8 \cdot 2 + 7 \cdot 4 + 6 \cdot 5 = 83.
The next convolution involves \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and \begin{array}{|c|c|} \hline 2 & 3 \\ \hline 5 & 6 \\ \hline \end{array} , then \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and \begin{array}{|c|c|} \hline 4 & 5 \\ \hline 7 & 8 \\ \hline \end{array} , and so on.
The resulting activation map in this case would be \begin{array}{|c|c|} \hline 83 & 113 \\ \hline 173 & 203 \\ \hline \end{array} .
However, programmatically, the approach we just performed involves the extensive use of for
loops, which are quite slow and inefficient. If we pay close attention to the convolutions performed above, we can see that we are effectively repeatedly taking products and summing–which is exactly what a dot product is.
Therefore, it is possible to rewrite the convolution operation as a matrix dot product, which is significantly quicker to compute, as also demonstrated in this 2006 paper.
If we look again at the convolution operations above, \begin{array}{|c|c|} \hline 1 & 1 \\ \hline 1 & 1 \\ \hline \end{array} * \begin{array}{|c|c|} \hline 1 & 2 \\ \hline 4 & 5 \\ \hline \end{array} can be rewritten as \begin{bmatrix} 1 & 1 & 1 & 1 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 2 \\ 4 \\ 5 \end{bmatrix} or \begin{bmatrix} 1 \\ 2 \\ 4 \\ 5 \end{bmatrix} \cdot \begin{bmatrix} 1 & 1 & 1 & 1 \end{bmatrix}. This latter notation more closely resembles the typical linear combination notation in neural networks: \text{inputs}@\text{kernel} = \text{inputs}@\text{weights} = x@w = x \cdot w^T.
If we go back to our original kernel and input tensors, \begin{array}{|c|c|} \hline 9 & 8 \\ \hline 7 & 6 \\ \hline \end{array} and \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline 4 & 5 & 6 \\ \hline 7 & 8 & 9 \\ \hline \end{array}, we can rearrange these matrices like this.
If we have multiple kernels, or multiple nodes, then it’s as simple as adding another column to the second matrix.
And now the process should begin to feel familiar. In standard neural networks, the second matrix contains the weights of each node in a given layer. In this case, the second matrix contains the weights of each kernel in a given layer.
If you have any comments, questions, suggestions, feedback, criticisms, or corrections, please do post them down in the comment section below!