Confusion about a concept written in fastbook (intuition of the amount computation in CNNs)

Hi @vrodriguezf,

This is actually a very good question! Our study group covered these chapters a few weeks ago (we’re now on last few chapters of the book).

On your second point the about how the majority of the computation happens in the early layers is that as you progress deeper into the layers of the model, the channels might increase, but the grid size of the image decreases.

One way of seeing this by computing the size of the activation maps (which factors in the image size) which is roughly proportional to the amount of computation done by each layer. – just multiply together the computation for the output shape.

If you run the learn.summary for a resnet18 on an MNIST dataset for example, you can see that the activation maps for the early layers are much larger (because the grid size is still large) than the sizes of the activation maps of the deeper layers.

Sequential (Input shape: ['64 x 3 x 28 x 28'])
================================================================
Layer (type)         Output Shape         Param #    Trainable 
================================================================
Conv2d               64 x 64 x 14 x 14    9,408      True      
________________________________________________________________
BatchNorm2d          64 x 64 x 14 x 14    128        True      
________________________________________________________________
ReLU                 64 x 64 x 14 x 14    0          False     
________________________________________________________________
MaxPool2d            64 x 64 x 7 x 7      0          False     
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True      
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True      
________________________________________________________________
ReLU                 64 x 64 x 7 x 7      0          False     
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True      
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True      
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True      
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True      
________________________________________________________________
ReLU                 64 x 64 x 7 x 7      0          False     
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True      
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True      
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     73,728     True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
ReLU                 64 x 128 x 4 x 4     0          False     
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     8,192      True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
ReLU                 64 x 128 x 4 x 4     0          False     
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     294,912    True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
ReLU                 64 x 256 x 2 x 2     0          False     
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     32,768     True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
ReLU                 64 x 256 x 2 x 2     0          False     
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     1,179,648  True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
ReLU                 64 x 512 x 1 x 1     0          False     
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     131,072    True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
ReLU                 64 x 512 x 1 x 1     0          False     
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
AdaptiveAvgPool2d    64 x 512 x 1 x 1     0          False     
________________________________________________________________
AdaptiveMaxPool2d    64 x 512 x 1 x 1     0          False     
________________________________________________________________
Flatten              64 x 1024            0          False     
________________________________________________________________
BatchNorm1d          64 x 1024            2,048      True      
________________________________________________________________
Dropout              64 x 1024            0          False     
________________________________________________________________
Linear               64 x 512             524,288    True      
________________________________________________________________
ReLU                 64 x 512             0          False     
________________________________________________________________
BatchNorm1d          64 x 512             1,024      True      
________________________________________________________________
Dropout              64 x 512             0          False     
________________________________________________________________
Linear               64 x 10              5,120      True      
________________________________________________________________

Total params: 11,708,992
Total trainable params: 11,708,992

You will also note that even though the number of parameters increases in the deeper layers (before being reduced into the final number of output classes), the actual size of the activation maps grows smaller and smaller.

My mental picture for this is that the simple features get extracted by the early layers are combined into more complex ones by the deeper layers but the activation (or decision making) by the deeper neurons are actually fewer because the earlier layers already extracted the features necessary (theres a horizontal line near the top, and a diagonal line somewhere in the middle to the right of the image) so the deeper layer just has to decide whether this is a one or a seven – something like that…

Hope this answers your question.

Also pinging @marii and @tyoc213 to confirm my understanding :smiley:

Best regards,
Butch

1 Like