PR: Ability to use dropout at prediction time (Monte Carlo Dropout)

Thanks for doing this! If I understand correctly what you did, @mrdbourke and @sgugger , what you implemented is with a different dropout mask for each data record. Did I understand that right?

I was looking at something that @zlapp also seems to have looked at, namely BatchBALD. There it is crucial* that for one sample of predictions you use the same dropout mask for every data record (of course, to characterize uncertainty, you want different masks for each sample from the predictions). Maybe I did not dig deeply enough, but I got the impression that that’s not part of the current solution?

Or is this already easily possible? (If not, any pointers on how to do it, would be most welcome…)

* My attempt at providing an approx. intuition of what BatchBALD tries to achieve is: If you select a single new record to label, you simply pick a record with a lot of variation in the predictions (you get that by just getting predictions with dropout at inference time repeatedly). However, when you select a whole batch of records for labelling, then you also want diversity in the records to learn about different things. The idea here is that if the predictions for two records from different models are quite highly correlated, then they are not all that diverse and if you only pick a few records, you may just one of these two and may want to select a different record that is changing in a less related way (even if the “univariate” uncertainty about it may be smaller).