Is dropout a good idea for convolutional layers? I get the idea on fully connected layers, we don’t want a single weight to have too much importance, but if we apply that to conv layers, we end up damaging the filters…
Validation set.
Validation accuracy
How do we add xc_fc
layers?
We pass in the sizes to the list?
Why do we remove dropout for inference, rather than averaging over the predictions for a bunch of random dropouts? It seems the latter would be better (though more expensive), because the model was trained to predict well with the dropout.
What’s the intuition on why Dropout works. It’s basically like sleep: doing neuron pruning? Are there other techniques that are more similar to “sleep” than random pruning kind of like short term and long term reinforced activations similar to short term and long term memory?
Based on the fastai lib:
xtra_fc (list of ints): list of hidden layers with # hidden neurons
Edit:
Locations of the implementation if you wanted to dive in further.
File: conv_learner.py
Class: ConvnetBuilder
Anywhere
Some people explain it as training an ensemble instead of a single network.
Dropouts ensure that all your neurons are getting trained. If you don’t have dropout, it is quite possible that only the dominant ones are getting the training. Hence it’s a training thing only.
@yinterian Are we able to set custom dropout values for each layer (as opposed to a single dropout value)?
Its not so much that the model is trained well to predict with dropout, as that we use drop out to help keep the model from memorizing the specific training examples too well.
During an animal’s sleep, connections deemed not valuable are prunned permanently. Dropout is different - it is temporary (for each training batch), and is not used during prediction. It makes sure the model doesn’t rely too much on any one neuron activation, which aids generalization.
Why do we monitor the loss to go down instead of the accuracy go up?
Is there a reason you use logsoftmax rather than softmax? I thought softmax already incorporated “information” inside and it doesn’t need another log.
Is the dropout can be applied to the linear or fully connected layers? If it is what value will it add?
Because you need to finish with a prob always, linear layer does that for you
Because when the accuracy gets higher it becomes harder to correctly classify more results, but the loss function might be able to be improved.
One intuition is that modifying a network by dropping out random neurons is like having a different network to train at each dropout. At inference time then by using all the neurons it’s like using an ensemble of networks.