[Article] Bayesian Neural Networks experiments using Fastai

Hello there ! :slight_smile:

I just wrote an article introducing Bayesian Neural Networks, how they work and how they can be leveraged to get uncertainty estimates, almost for free, using MC Dropout !
I have then done a few experiments in Fastai, on image, tabular and text data.

This stuff is fun and I recommend you to have a read, because it opens the way for many things, such as active learning, ethics, parcimony, etc …

Tell me what you think :smiley:

11 Likes

Nice article. Bayesian uncertainty seems like it could pair well with pseudo-labeling additional unlabeled training data. Could use it throw out the unconfident pseudo-labels, create weights for the pseudo-labels based on confidence, or adjust how soft the pseudo-labels are based on confidence. I’ll have to do some experiments with my current dataset and see how well Bayesian pseudo-labels works.

I was reading through your code and saw that you were using learn.predict_with_mc_dropout for the Bayesian uncertainty predictions. For example, in predict_entropy:

def predict_entropy(img,n_times=10):
    pred = learn.predict_with_mc_dropout(img,n_times=n_times)
    probs = [prob[2].view((1,1) + prob[2].shape) for prob in pred]
    probs = torch.cat(probs)
    e = entropy(probs)
    return e

however, I couldn’t find the definition for predict_with_mc_dropout in your Colab notebook or github repository.

I totally agree with you ! Actually, the article which got me motivated was this one about histopathological data from Nature : https://www.nature.com/articles/s41598-019-50587-1?fbclid=IwAR3-ns4LGBWPOb_AK9sNdPM8X3-7HF1OInobzXPozbjzIHFQYILvF0hE_wM#Abs1

In this article, they do what you said, and study how confidence is an excellent indicator of mislabelled data : misclassified images with low uncertainty were most likely mislabelled.

Yeah about that learn.predict_with_mc_dropout I noticed when working on it that someone had sent a PR, which you can find here PR: Ability to use dropout at prediction time (Monte Carlo Dropout)

I will implement a parallelized version and also post the code, so don’t worry about this little piece of code ^^

Hello Daniel

brilliant article! I have a similar problem in trying to classify cervical cancer images with a dataset <1000 (~but growing slowly).

I was wondering you applied this on segmentation as the nature paper you’re inspired from has done? And, any updates on posting a full code amongst other things learn.predict_with_mc_dropout?

Cheers, Hud

Hello Hud,

Thank you :slight_smile:

I used this on segmentation as well, you can use it on the segmentation exercise that Jeremy showed in Fastai (https://course.fast.ai/videos/?lesson=3)

It’s basically the same thing, simply you compute entropy for each pixel, which makes it quite computationally heavy, but interesting.

I am sorry for the learn.predict_with_mc_dropout thing, haven’t had time to implement it. Basically you just need to run the same batch several times and concatenate the results, but doing this in parallel would be a good thing.

1 Like

Hii is there any other better hyper parameter optimiser other than Bayesian?

I tried training the titanic dataset with fastai defaults, Bayesian tuning and XGBoost. Of these the accuracy with bayesian tuned hyp params and the fastai default params almost gave the same results(82%) but the one with XGBoost gave a higher accuracy(83%)

Thanks,

@DanyWin Really like the clarity of your article! I read a few articles of BNN before, but it is the first one that got me clear its high-level mechanism.

I would like to learn more about BNN, do you know if BNN could work better than NN on small data? (e.g. ~100 data) And is it straight-forward to translate a BNN for classification to a BNN for object detection / segmentation?

1 Like

Thank you for your kind comment :slight_smile:

BNN can be useful for small datasets because they provide metrics about the uncertainty of your predictions. Moreover, they can be used in the active learning setting, because they tell you which samples are the most interesting to label to speed up the training of your model.

You can find more details here : https://arxiv.org/abs/1703.02910

Actually to get a BNN it can be extremely fast if you choose to consider a BNN model with Bernoulli weights, as it is simply using Dropout both for training and inference. Therefore if you find a model, such as a Resnet which has been trained with dropout, you can make it bayesian instantly simply by keeping Dropout during inference

1 Like

@DanyWin
Thanks for the reference paper.

I am pretty interested to learn how we could train a segmentation / object detection model under a constrained resource (e.g. limited annotation labor, limited labelled data). It is so practical a issue that one would face in both industry and personal projects. And it especially fits to the industry setting where you wanna maximize the utility you get when doing each annotation. And it’s great to learn that BNN is a direction for that!

I saw a few papers related to active learning, but usually they don’t disclose source code / have restricted access to source code. (one example is PolygonRNN++) Do you have any repo in mind that could be good starting points? :grin: