Difference between maxpooling and convolutional layers with stride=2


In one of Jeremy’s lectures, he replaces max-pooling layers with convolutional layers with stride=2. By comparing DarkNet architecture wich also uses convolutional layers with stride = 2 against ResNet, we can see that the former architecture achieves better result. Is there any rationale behind choosing convolutional layer with stride = 2 instead of max-pooling? Could be max-pooling layers in general replaced by convolutional layers (are max-pooling layers by chance outdated)?


I haven’t seen a direct comparison between the two approaches, but intuitively stride-2 convolution seems better than max-pooling, because it keeps information from all pixels, rather than discarding some of them. In chapter 14 of fastai book, the ResNet architecture is based on stride-2 convolutions, followed by adaptive average pooling layer at the end.