Continuing the discussion from Confusion about a concept written in fastbook (intuition of the amount computation in CNNs):
I read this thread because I had the exact same question. I’m sorry for repeating a topic, but even after the detailed answer from @butchland I’m still not feeling any wiser. Chapter 13 of the book explicitly calculates the number of multiplications for different layers and concludes:
What happened here is that our stride-2 convolution halved the grid size from
14x14
to7x7
, and we doubled the number of filters from 8 to 16, resulting in no overall change in the amount of computation.
Highlighting by me. So why do we use plain convolutional layers instead of ResNet blocks as the stem of a ResNet in chapter 14?
The reason that we have a stem of plain convolutional layers, instead of ResNet blocks, is based on a very important insight about all deep convolutional neural networks: the vast majority of the computation occurs in the early layers. Therefore, we should keep the early layers as fast and simple as possible.
What am I missing here?