*The next layer, according to summary, has 296 parameters. Let’s ignore the batch axis to keep things simple. So for each of 14 * 14=196 locations we are multiplying 296-8=288 weights (ignoring the bias for simplicity), so that’s 196 * 288=56_448 multiplications at this layer.

I don’t follow where the 288 (or 296) comes from. Can someone help?

learn.summary gives you the below. Notice the second conv2d has 296 parameters. Also notice the output shape of the number of output channels 64 x **8** x 7 x 7 , the number of parameters for your bias is always equal to your number of output channels.

learn.summary does not tell you were the number of parameters comes from. This line in the simple_cnn model does: conv(4 ,8), by default the “kernel” size is 3x3, so you can just multiply all of these numbers to get your number of non-bias parameters 4*8*3*3=288. Again, the number of bias parameters is equal to the number of output channels, so 288+8=296.

Sequential (Input shape: [‘64 x 1 x 28 x 28’])
================================================================
Layer (type) Output Shape Param # Trainable
================================================================
Conv2d 64 x 4 x 14 x 14 40 True
________________________________________________________________
ReLU 64 x 4 x 14 x 14 0 False
________________________________________________________________
Conv2d 64 x 8 x 7 x 7 296 True

that is awesome, thank you Molly. One followup conceptual question, if that’s ok.

For this layer, I can understand why there are 8 * 3 * 3 trainable parameters since each filter is 3 by 3 and there are 8 filters total. But for this layer, why is it all multiplied by the number of filters coming in, ie 4?

It seems like the 4 is accounted for by layer 0 which has the 4 * 1 * 3 * 3 + 4 = 40 parameters?