Chapter 13 parameter question

In chapter 13, there is this paragraph:

*The next layer, according to summary, has 296 parameters. Let’s ignore the batch axis to keep things simple. So for each of 14 * 14=196 locations we are multiplying 296-8=288 weights (ignoring the bias for simplicity), so that’s 196 * 288=56_448 multiplications at this layer.

I don’t follow where the 288 (or 296) comes from. Can someone help?

chapter 13 link

learn.summary gives you the below. Notice the second conv2d has 296 parameters. Also notice the output shape of the number of output channels 64 x **8** x 7 x 7 , the number of parameters for your bias is always equal to your number of output channels.

learn.summary does not tell you were the number of parameters comes from. This line in the simple_cnn model does: conv(4 ,8), by default the “kernel” size is 3x3, so you can just multiply all of these numbers to get your number of non-bias parameters 4*8*3*3=288. Again, the number of bias parameters is equal to the number of output channels, so 288+8=296.

Sequential (Input shape: [‘64 x 1 x 28 x 28’])
================================================================
Layer (type) Output Shape Param # Trainable
================================================================
Conv2d 64 x 4 x 14 x 14 40 True
________________________________________________________________
ReLU 64 x 4 x 14 x 14 0 False
________________________________________________________________
Conv2d 64 x 8 x 7 x 7 296 True

that is awesome, thank you Molly. One followup conceptual question, if that’s ok.

For this layer, I can understand why there are 8 * 3 * 3 trainable parameters since each filter is 3 by 3 and there are 8 filters total. But for this layer, why is it all multiplied by the number of filters coming in, ie 4?

It seems like the 4 is accounted for by layer 0 which has the 4 * 1 * 3 * 3 + 4 = 40 parameters?

For that I suggest playing with this code from the notebook:

def apply_kernel(row, col, kernel):
    return (im3_t[row-1:row+2,col-1:col+2] * kernel).sum()

Do you notice the multiply and sum?

It seems like the 4 is accounted for by layer 0 which has the 4 * 1 * 3 * 3 + 4 = 40 parameters?

Where does the “1” in this come from? We are working with 1-channel images. You are essentially multiplying by the input number of channels.

If you still have questions I can try making a gist to help.

1 Like

ah, I get it. thanks so much!