How does one optimize the parameters of the filter matrices in a convolutional layer?

Does anyone have a good understanding of what goes on under the hood when training a convolutional layer? How do the filter weights actually get optimized? Any hints/links will be really helpful.

What I am unable to wrap my head around is that while each filter matrix has a very small number of parameters it ends up talking to all the pixels of the input image to get the new filtered/convolved image. I am thinking of the mapping between these two images (with the convolutional layer in between them) in terms of a matrix (bigger than 3x3), but because the filter is the same, the elements of this matrix can be obtained in terms of the original parameters of the filter. What does this matrix then look like?

Alternatively, how do you do train a neural network for the case when parameters are shared between different inputs and outputs?

Hoping my question makes sense, and somebody out there has an answer for me.


Which lessons have you watched so far? I did try to cover this reasonably thoroughly in the course - so hopefully it’s just a case of getting through the lessons…


Thanks for your quick answer, Jeremy. Looking through the course wiki, I see that you are covering this in lesson 4. I just finished lesson 3. Will take a look at lesson 4 now that I have a burning question to be answered.

Also, thinking more about it, maybe I am aiming for the wrong picture to visualize a big matrix. It is probably better to think in terms of a function of 9 variables and an associated cost function that we are then trying to minimize. We can then do this by the usual gradient descent in the parameter space.

1 Like