It’s well known dropout layer reduces the overfit of the neural network. But sometimes dropout may have a non-obvious negative effect – it may hide the fact of overfitting.
The issue background is the following:
- I use fastai to build a deep fully connected neural network.
- I use dropout to prevent overfitting
- One of the layers in my network has 7 neurons and 0.5 dropout.
- The validation mechanism in my task requires me to keep training loss and validation loss about the same. If training loss is much lower than validation loss - I get a huge error on the test set.
So when I train the network - at some point I get a good looking result: train loss 1.62, validation loss 1.58. But I feel like the model is overfitted, and inference on the test set says the same – the model is overfitted. Finally I found the issue. In fastai train loss is calculated during the training process, i.e. dropout is applied and the train loss is high because of the dropout. Validation loss is calculated without the dropout. The actual train loss (without dropout) is 0.98.
The issue is caused by the fact I use high dropout (0.5) on a small layer (7 neurons). If there’s 20 neurons - difference between dropped out train loss and actual train loss becomes significantly lower.
So be careful using dropout on small layers - it may overestimate the actual train loss.