# Confusion about a concept written in fastbook (intuition of the amount computation in CNNs)

Hi,

I am a bit confused with two pieces of fastbook talking about the intuition of the amount of computation in the different layers of a CNN.

In chapter 13 (convolutions), it says:

If we left the number of channels the same in each stride-2 layer, the amount of computation being done in the net would get less and less as it gets deeper. But we know that the deeper layers have to compute semantically rich features (such as eyes or fur), so we wouldn’t expect that doing less computation would make sense.

Then, in chapter 14 (resnet), it says:

The reason that we have a stem of plain convolutional layers, instead of ResNet blocks, is based on a very important insight about all deep convolutional neural networks: the vast majority of the computation occurs in the early layers.

Reading the first sentence I understand that computation grows with the depth of a CNN, but reading the second makes me think the contrary thing.

What am I misunderstanding?
Thanks!

Hi @vrodriguezf,

This is actually a very good question! Our study group covered these chapters a few weeks ago (we’re now on last few chapters of the book).

On your second point the about how the majority of the computation happens in the early layers is that as you progress deeper into the layers of the model, the channels might increase, but the grid size of the image decreases.

One way of seeing this by computing the size of the activation maps (which factors in the image size) which is roughly proportional to the amount of computation done by each layer. – just multiply together the computation for the output shape.

If you run the `learn.summary` for a `resnet18` on an `MNIST` dataset for example, you can see that the activation maps for the early layers are much larger (because the grid size is still large) than the sizes of the activation maps of the deeper layers.

``````Sequential (Input shape: ['64 x 3 x 28 x 28'])
================================================================
Layer (type)         Output Shape         Param #    Trainable
================================================================
Conv2d               64 x 64 x 14 x 14    9,408      True
________________________________________________________________
BatchNorm2d          64 x 64 x 14 x 14    128        True
________________________________________________________________
ReLU                 64 x 64 x 14 x 14    0          False
________________________________________________________________
MaxPool2d            64 x 64 x 7 x 7      0          False
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True
________________________________________________________________
ReLU                 64 x 64 x 7 x 7      0          False
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True
________________________________________________________________
ReLU                 64 x 64 x 7 x 7      0          False
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     73,728     True
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True
________________________________________________________________
ReLU                 64 x 128 x 4 x 4     0          False
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     8,192      True
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True
________________________________________________________________
ReLU                 64 x 128 x 4 x 4     0          False
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     294,912    True
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True
________________________________________________________________
ReLU                 64 x 256 x 2 x 2     0          False
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     32,768     True
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True
________________________________________________________________
ReLU                 64 x 256 x 2 x 2     0          False
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     1,179,648  True
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True
________________________________________________________________
ReLU                 64 x 512 x 1 x 1     0          False
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     131,072    True
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True
________________________________________________________________
ReLU                 64 x 512 x 1 x 1     0          False
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True
________________________________________________________________
AdaptiveAvgPool2d    64 x 512 x 1 x 1     0          False
________________________________________________________________
AdaptiveMaxPool2d    64 x 512 x 1 x 1     0          False
________________________________________________________________
Flatten              64 x 1024            0          False
________________________________________________________________
BatchNorm1d          64 x 1024            2,048      True
________________________________________________________________
Dropout              64 x 1024            0          False
________________________________________________________________
Linear               64 x 512             524,288    True
________________________________________________________________
ReLU                 64 x 512             0          False
________________________________________________________________
BatchNorm1d          64 x 512             1,024      True
________________________________________________________________
Dropout              64 x 512             0          False
________________________________________________________________
Linear               64 x 10              5,120      True
________________________________________________________________

Total params: 11,708,992
Total trainable params: 11,708,992
``````

You will also note that even though the number of parameters increases in the deeper layers (before being reduced into the final number of output classes), the actual size of the activation maps grows smaller and smaller.

My mental picture for this is that the simple features get extracted by the early layers are combined into more complex ones by the deeper layers but the activation (or decision making) by the deeper neurons are actually fewer because the earlier layers already extracted the features necessary (theres a horizontal line near the top, and a diagonal line somewhere in the middle to the right of the image) so the deeper layer just has to decide whether this is a one or a seven – something like that…

Also pinging @marii and @tyoc213 to confirm my understanding

Best regards,
Butch

1 Like

Thank you for your reply @butchland! There’s also the idea that deeper layers have larger receptive fields (this is also noted in chapter 13), which supports the intuition of deeper layers extracting richer features…

I wonder what takes more time in terms of pure computation with GPUs:

• few filters and large activation maps
• lot of filters and small activation maps

Best!

It shows the math you can use to determine how fast or slow a layer is.

Note that the amount of computation is strongly correlated with the speed of a layer, but because we’re running this stuff on actual computers – as opposed to being a pure math problem – other factors are involved too (notably, how many memory accesses are needed).

5 Likes

@vrodriguezf I think exactly what you are saying is answered above. While there are some exceptions (EfficientNet) in general whats takes the most time should be what takes the most computation.

When this is not the case the hardware is simply not being used effectively, or was not designed for the problem. Resnet is fairly well supported as this point, so we can say that whatever takes the most computation will take the most time.

If you are designing your own layer, you run into consideration about memory layout of the GPU, and things of that nature. When you are worried about speed of course.

Thanks! Very nice article!