Let’s define our quantities! The input patch is
data:image/s3,"s3://crabby-images/20a89/20a893a9bb563cae54e1acefc3429fc5d54b34ad" alt="$$X = \begin{pmatrix} x_1 & x_2 & x_3 \ x_4 & x_5 & x_6 \ x_7 & x_8 & x_9 \end{pmatrix}$$"
a 3x3 kernel is made of
data:image/s3,"s3://crabby-images/fc06d/fc06de2bfdbef7366f5a278743a2e1e66137b75c" alt="$$A = \begin{pmatrix} a_1 & a_2 & a_3 \ a_4 & a_5 & a_6 \ a_7 & a_8 & a_9 \end{pmatrix}$$"
and the two 2x2 kernels are
data:image/s3,"s3://crabby-images/3ef96/3ef968c7fd91774e1250b734925197807a7a37a4" alt="$$B= \begin{pmatrix} b_1 & b_2 \ b_3 & b_4 \end{pmatrix}$$ $$C= \begin{pmatrix} c_1 & c_2 \ c_3 & c_4 \end{pmatrix}$$"
Now, let’s apply the 3x3 convolution to the input, the result is
data:image/s3,"s3://crabby-images/499a7/499a7e22272905f257234dedeba4b7de1c827960" alt="$$a_1 x_1 + a_2 x_2 + a_3 x_3 + a_4 x_4 + a_5 x_5 + a_6 x_6 + a_7 x_7 + a_8 x_8 + a_9 x_9$$"
When instead we apply the 2x2 convolution in serie, we get first
data:image/s3,"s3://crabby-images/1541d/1541dd6ac81a8c0e02cca003a620a953724ae917" alt="$$\begin{pmatrix} b_1 x_1 + b_2 x_2 + b_3 x_4 + b_4 x_5 & b_1 x_2 + b_2 x_3 + b_3 x_5 + b_4 x_6 \ b_1 x_4 + b_2 x_5 + b_3 x_7 + b_4 x_8 & b_1 x_5 + b_2 x_6 + b_3 x_8 + b_4 x_9 \end{pmatrix}$$"
by applying the first kernel to the input and then
data:image/s3,"s3://crabby-images/da9e8/da9e86560ae5e5c46d5182c82836d5c9c4f3e11e" alt="$$c_1 \left( b_1 x_1 + b_2 x_2 + b_3 x_4 + b_4 x_5 \right ) + c_2 \left( b_1 x_2 + b_2 x_3 + b_3 x_5 + b_4 x_6 \right )+ c_3 \left(b_1 x_4 + b_2 x_5 + b_3 x_7 + b_4 x_8 \right) + c_4 \left( b_1 x_5 + b_2 x_6 + b_3 x_8 + b_4 x_9 \right)$$"
by applying the second kernel to it!
data:image/s3,"s3://crabby-images/44800/44800a79ea4ca7e7c5d893e5436ff92b7f701199" alt="$$c_1 b_1 x_1 + c_1 b_2 x_2 + c_1 b_3 x_4 + c_1 b_4 x_5 + c_2 b_1 x_2 + c_2 b_2 x_3 + c_2 b_3 x_5 + c_2 b_4 x_6 + c_3 b_1 x_4 + c_3 b_2 x_5 + c_3 b_3 x_7 + c_3 b_4 x_8 + c_4 b_1 x_5 + c_4 b_2 x_6 + c_4 b_3 x_8 + c_4 b_4 x_9$$"
If we rearrange things a bit we can see that we can collect the x s into
data:image/s3,"s3://crabby-images/f801d/f801dd27e62ac4a9e3f2a943ff33b7c47c6dbb9e" alt="$$ (c_1 b_1) x_1 + (c_1 b_2 + c_2 b_1) x_2 + (c_1 b_3+c_3 b_1) x_4 + (c_1 b_4 + c_2 b_3 + c_3 b_2 + c_4 b_1) x_5 + (c_2 b_2) x_3 + (c_2 b_4+c_4 b_2) x_6 + (c_3 b_3) x_7 + (c_3 b_4 +c_4 b_3) x_8 + (c_4 b_4) x_9$$"
and so this is where we compare this result with the initial application of the 3x3 to the inputs (recall that it was this below)
data:image/s3,"s3://crabby-images/499a7/499a7e22272905f257234dedeba4b7de1c827960" alt="$$a_1 x_1 + a_2 x_2 + a_3 x_3 + a_4 x_4 + a_5 x_5 + a_6 x_6 + a_7 x_7 + a_8 x_8 + a_9 x_9$$"
If we equate the two results we can see that for them to be equivalent we must have
data:image/s3,"s3://crabby-images/b6e6e/b6e6e723bdd249940f636c97cd4e05975afcfe03" alt="$$\begin{cases} (c_1 b_1) = a_1 \ (c_1 b_2 + c_2 b_1) = a_2 \ (c_2 b_2) = x_3 \ (c_1 b_3+c_3 b_1) = a_4 \ (c_1 b_4 + c_2 b_3 + c_3 b_2 + c_4 b_1) = a_5 \ (c_2 b_4+c_4 b_2) = a_6 \ (c_3 b_3) = a_7 \ (c_3 b_4 +c_4 b_3) = a_8 \ (c_4 b_4) = a_9 \end{cases} $$"
but you can verify easily that when you try to solve this wrt to the kernel A (considering all of B abc C as given), this is trivially a system of 9 equations in 9 unknowns. I’ve laid out all you need to find the values of A!
(In layman terms, the 3x3 convolution can exactly reproduce the result of the two 2x2 convolutions if we do not take the non-linearities into account)
The problem comes from when you try to solve it the other way around, wrt to B and C! Unless we restrict some values of A, exactly one to be precise, this is a system of 9 equations in 8 (!!!) unnknowns, the values b_1, ... b_4, c_1, ... c_4, so the system has no solution!
Again, informally speaking this means that (non-linearity notwithstanding) the two 2x2 kernel cannot in general reproduce exaclty the result of a single 3x3 convolution!
The same of course apply to 5x5 versus two 3x3 kernels though, and we know that in practice often we get satisfactory results as well, but this is how you prove “formally” that a single bigger kernel is in general more expressive.
I hope that clear! data:image/s3,"s3://crabby-images/e4c16/e4c1627f8e62b6e621380936841891c28f012af6" alt=":slight_smile: :slight_smile:"
EDIT: argh! equations are not rendered! Give me a sec while I work on a solution! data:image/s3,"s3://crabby-images/ac9d6/ac9d6b6d1694d9b503bdf5626678136e1c9b707c" alt=":stuck_out_tongue: :stuck_out_tongue:"
EDIT2: saved? yes! Thanks codecogs