Hey guys, why in resnet, this is how to decide downsample the activation or not?
In my understanding, because padding by default is 1. If stride=1; padding=1 => output size is same as input size. If stride > 1 => mismatch => downsample
When the stride is larger than 1, the height and width of the output would be smaller than those of the input, and thus the demand for downsampling. Additionally, the number of channels might change, in which case downsampling would again be required (although the term downsampling is used, there may be in fact no downsampling, and solely the number of channels is changed). Note that these two scenarios are distinct - one might have a stride of, say, 2 without modifying the number of channels, or one might also adjust the number of channels with a stride of 1.