Is dropout still relevant for convolutions?

In Lesson 6, Jeremy has presented Dropout as an efficient way to regularize our network, when correctly tuned. He has shown the example of dropout used in a fully-connected network but what about convolutional neural networks ?

Authors from the Dropout paper state:

The best performing convolutional nets that do not use dropout achieve an error rate of 3.95%. Adding dropout only to the fully connected layers reduces the error to 3.02%. Adding dropout to the convolutional layers as well further reduces the error to 2.55%. Even more gains can be obtained by using maxout units.

The additional gain in performance obtained by adding dropout in the convolutional layers (3.02% to 2.55%) is worth noting. One may have presumed that since the convo- lutional layers don’t have a lot of parameters, overfitting is not a problem and therefore dropout would not have much effect. However, dropout in the lower layers still helps because it provides noisy inputs for the higher fully connected layers which prevents them from overfitting.

However, most of the recent CNN architectures that I came accross don’t use use Dropout in their convolutional layers anymore, but only at the end, when they have some fully-connected layers.

Also, fastai seems to follow this trend. When passing the ps argument to create_cnn, Dropout layers are only created at the end of the network, where fully-connected layers are.

So my question is : Is Dropout still relevant as a regularizer for convolutional layers or is the combination of Batch Normalization and weight decay/L2 regularization often enough ?


I’m also interested in having an answer to this question.
I’d like to add something else @bluesky314 mentioned in this post, that is Google Brain’s DropBlock. It’s a very recent paper (Oct 30th, 2018), but the results seem promising.
The have also released their pytorch code. Has anybody tried it yet?


The DropFilter paper (also very recent: Oct 23th, 2018) seems to bring some additional information. I will read it and see how it compares to DropBlock.

I tried dropblock and it gave me quite better results than dropout. Also its super easy to use.


Most current SoTA architectures don’t have dropout in the conv part of the architecture. But as pointed out by others here, that may change thanks to very recent research.