Hi,
In one of Jeremy’s lectures, he replaces max-pooling layers with convolutional layers with stride=2. By comparing DarkNet architecture wich also uses convolutional layers with stride = 2 against ResNet, we can see that the former architecture achieves better result. Is there any rationale behind choosing convolutional layer with stride = 2 instead of max-pooling? Could be max-pooling layers in general replaced by convolutional layers (are max-pooling layers by chance outdated)?
Thanks!