Lesson 10, first layer kernel size justification

In lesson 10 at ~1h09m. I’m not sure if I got Jeremy’s point correctly. He says that a 3x3 kernel operating on a color image with 32 filters goes from 3x3x3=27 numbers to 32 number. He says that this leads to a loss of information. Instead, to avoid losing information, he says that the first layer is often 7x7 instead. Then you have 7x7x3=147 numbers in and 64 coming out (I think 64 is default for pytorch resnet).
While I understand why you would want to end up with less numbers than you started, I don’t understand how you can lose information if your output space is bigger than the input space (32 vs 27). Could anyone help my understanding here?


@sgugger an u help us with it?