Hinton et al have published a paper titled “When does Label Smoothing help?”. They did some great visualizations and also went deep into how label smoothing calibrates a neural network and when and when not to use it.
Some visuals:
I wrote a summary article with the key take-aways and bit ‘less math, more code’ on Medium:
and link to the full paper is here: