In Lesson 6, Jeremy has presented Dropout as an efficient way to regularize our network, when correctly tuned. He has shown the example of dropout used in a fully-connected network but what about convolutional neural networks ?
Authors from the Dropout paper state:
The best performing convolutional nets that do not use dropout achieve an error rate of 3.95%. Adding dropout only to the fully connected layers reduces the error to 3.02%. Adding dropout to the convolutional layers as well further reduces the error to 2.55%. Even more gains can be obtained by using maxout units.
The additional gain in performance obtained by adding dropout in the convolutional layers (3.02% to 2.55%) is worth noting. One may have presumed that since the convo- lutional layers don’t have a lot of parameters, overfitting is not a problem and therefore dropout would not have much effect. However, dropout in the lower layers still helps because it provides noisy inputs for the higher fully connected layers which prevents them from overfitting.
However, most of the recent CNN architectures that I came accross don’t use use Dropout in their convolutional layers anymore, but only at the end, when they have some fully-connected layers.
Also, fastai seems to follow this trend. When passing the ps
argument to create_cnn
, Dropout layers are only created at the end of the network, where fully-connected layers are.
So my question is : Is Dropout still relevant as a regularizer for convolutional layers or is the combination of Batch Normalization and weight decay/L2 regularization often enough ?