Thought to bring this might be interesting for in terms of playing with customizing training loops or just would spark more interesting practical ideas.

At very short (in my interpretation):

Folks (in paper below) show that using dropout technique is equivalent to Bayesian NN’s. Bayesian inference is good when there is not enough data or there are significant sparse regions in training data.

Monte-Carlo Dropout (MC Dropout) at the end means *applying dropout during predictions* (not just training). Obviously that lowers overall model performance, but suppose to make model more uncertain when it should. For example when input data are far away from data it was trained on.

My experience with it was accidental, when I kept forgetting switch to eval() mode in my NMT model, and trying to debug why it translates the same phrase 5 different ways. Some of the translations were quite…creative.

Since it is easy to explore it (add dropout during predictions) I thought of a few it might bring practical value:

- Conditional GANs different sorts of - make critic less predictable for a generator and help with mode collapse.
- Regression with NNs (which is discussed in the paper). The less model sure - the higher variance we will receive in predictions.
- Make classification decisions based on several predictions (i.e. quorum) made for the same input - if variance too high - we can automatically capture that model is just not sure about what it is predicting.

We show that a neural network with arbitrary depth and

non-linearities,with dropoutapplied before every weight

layer, is mathematically equivalent to an approximation

to the probabilistic deep Gaussian process (Damianou &

Lawrence, 2013) (marginalised over its covariance function

parameters). We would like to stress that no simplifying as-

sumptions are made on the use of dropout in the literature,

and that the results derived are applicable to any network

architecture that makes use of dropout exactly as it appears

in practical applications.

Also this thread Uncertainty in Deep Learning (Bayesian networks, Gaussian processes)