Calibrating probabilities

jc-denton · May 19, 2020, 2:41pm

I read somewhere that the output of the Softmax layer does not represent true probabilities, which intuitively makes sense to me since Softmax only ensures that they are in the [0, 1] range and sum up to 1. Then I also found the paper “On Calibration of Modern Neural Networks”, fortunately the paper comes with code so I quickly tested it on one of my models out and got the following output:

Before temperature - NLL: 0.263, ECE: 0.016
Optimal temperature: 0.960
After temperature - NLL: 0.265, ECE: 0.017

So the Expected Calibration Error was already very low and got higher after calibration!

I wonder why this is, I use just ResNet18 and the experiments in the paper are with ResNet 110 so maybe only larger networks are not well calibrated anymore? The graph in Figure 2 shows and ECE of over 0.1 for a ResNet with 20 layers trained on ImageNet. I only have a few classes while ImageNet has thousands IIRC. Could that explain it? Or is there something else in the fastai library that already calibrates the models? IIRC Jeremy referred to the Softmax output as probabilities in the lecture so I wonder weather that could be the case.

jc-denton · May 28, 2020, 9:43am

I could get higher ECE values with the CIFAR-100 dataset and ResNet101 but still not as high those reported in the paper. I wonder if anybody else looked into this.

I also noted that the paper mentions ResNet110 but the model in torch is called ResNet101 (the same as in the original paper). So it seems that for whatever reason they added 9 more layers.

kshitijpatil09 · May 28, 2020, 10:54am

I doubt fastai has anything to do with calibration. You should continue exploring other reasons.

However, as you’re working on calibration, LabelSmoothing has definitely a part in it. You should checkout this paper: When Does Label Smoothing Help?

Would like to know your findings