I read somewhere that the output of the Softmax layer does not represent true probabilities, which intuitively makes sense to me since Softmax only ensures that they are in the [0, 1] range and sum up to 1. Then I also found the paper “On Calibration of Modern Neural Networks”, fortunately the paper comes with code so I quickly tested it on one of my models out and got the following output:
Before temperature - NLL: 0.263, ECE: 0.016
Optimal temperature: 0.960
After temperature - NLL: 0.265, ECE: 0.017
So the Expected Calibration Error was already very low and got higher after calibration!
I wonder why this is, I use just ResNet18 and the experiments in the paper are with ResNet 110 so maybe only larger networks are not well calibrated anymore? The graph in Figure 2 shows and ECE of over 0.1 for a ResNet with 20 layers trained on ImageNet. I only have a few classes while ImageNet has thousands IIRC. Could that explain it? Or is there something else in the fastai library that already calibrates the models? IIRC Jeremy referred to the Softmax output as probabilities in the lecture so I wonder weather that could be the case.