MC Dropout and practical ideas for it

Thought to bring this might be interesting for in terms of playing with customizing training loops or just would spark more interesting practical ideas.

At very short (in my interpretation):
Folks (in paper below) show that using dropout technique is equivalent to Bayesian NN’s. Bayesian inference is good when there is not enough data or there are significant sparse regions in training data.

Monte-Carlo Dropout (MC Dropout) at the end means applying dropout during predictions (not just training). Obviously that lowers overall model performance, but suppose to make model more uncertain when it should. For example when input data are far away from data it was trained on.

My experience with it was accidental, when I kept forgetting switch to eval() mode in my NMT model, and trying to debug why it translates the same phrase 5 different ways. Some of the translations were quite…creative.

Since it is easy to explore it (add dropout during predictions) I thought of a few it might bring practical value:

  1. Conditional GANs different sorts of - make critic less predictable for a generator and help with mode collapse.
  2. Regression with NNs (which is discussed in the paper). The less model sure - the higher variance we will receive in predictions.
  3. Make classification decisions based on several predictions (i.e. quorum) made for the same input - if variance too high - we can automatically capture that model is just not sure about what it is predicting.

We show that a neural network with arbitrary depth and
non-linearities, with dropout applied before every weight
layer, is mathematically equivalent to an approximation
to the probabilistic deep Gaussian process (Damianou &
Lawrence, 2013) (marginalised over its covariance function
parameters). We would like to stress that no simplifying as-
sumptions are made on the use of dropout in the literature,
and that the results derived are applicable to any network
architecture that makes use of dropout exactly as it appears
in practical applications.

Also this thread Uncertainty in Deep Learning (Bayesian networks, Gaussian processes)


what a coincidence: the recent twiml talk #255 talks about how dropout during inference can be used in active learning to evaluate uncertainty of the prediction and thus point to which samples should be labelled next

1 Like

I have recently come across this applied to semantic segmentation to calculate pixelwise uncertainty in the predictions

I’ve also seen it applied to healthcare data: