While working on my notes from courses I took in the recent past, I realized I never fully understood how to compute the number of parameters a CONV layer requires.
Let’s take as an example CONV1. I thought that layer had ((f \times f \times n_{c}) + 1) \times \textrm{# filters} = ((5 \times 5 \times 3) + 1) \times 8 = 608 parameters.
The n_{c} for each filter in the CONV1 layer must be 3 since it has to match the number of channels in the input image. The 1 in the formula above is the bias term, which I’m assuming is just 1 integer. Since the output activation is of size (28 \times 28 \times 8) we will need 8 filters in the CONV1 layer.
For completeness, I went back to Coursera and checked the material again. Prof. Ng added a page containing a list of corrections to the slide I posted yesterday which wasn’t available when I took the course