Issues with DiceLoss UNET

JelleVE · August 21, 2022, 9:34am

Hi, I’m currently working on a project involving building footprint detection (binary segmentation) from aerial images. I have a working implementation in which I use the default loss function (CrossEntropyLossFlat) which yields decent results.

Now I have read about DiceLoss and how it is supposed to be a good loss function for these types of problems. I have done a number of testruns, and I find myself struggling with these loss function. Often, especially during the first few epochs, it gives blank predictions (all background), in other cases it does the reverse (all footprint). And other times I get very good results.

Is this loss function typically very sensitive to the learning rate (moreso than CrossEntropyLossFlat), or is there something that I am missing?

Some implementation details:

I’m using exactly the same code for both DiceLoss and CrossEntropyLossFlat, only with different loss functions.
In order to automate things a bit more, I’m always using the ‘slide’ learning rate from the lr_finder.
I’m doing progressive resizing in which the best model of the previous size is reused.

BobMcDear · August 22, 2022, 4:04pm

Hello,

In my experience, dice loss can be more volatile than cross-entropy, and the loss trajectory is often fickle. For instance, I have sometimes encountered exploding gradients when training with dice loss, an unsurprising phenomenon once you inspect how its gradients are calculated.

The formula above, obtained from V-Net: Fully Convolutional Neural Networks for
Volumetric Medical Image Segmentation, describes the gradients of the loss, denoted as D, with respect to the network’s predictions, represented by p (g is the ground truth). It is easy to realize why too small a value for p or g might be troublesome, as the denominator would become minuscule, and the gradients would blow up.

Additionally, dice loss can be overly sensitive to incorrect classifications of small objects, i.e., mislabelling a few pixels of a small object would produce a large loss, effectively equivalent to completely mislabelling a large object. This is both a feature and a bug; on one hand, it makes the model more robust for the classification of small objects or rare categories, but it can engender instabilities as well.

Therefore, it is best to test dice loss and cross-entropy on your task, and empirically discover which is more appropriate. Nonetheless, a heuristic that has worked well for me is as follows. Select plain cross-entropy if there is no class imbalance. Otherwise, try weighted cross-entropy and in case it was inaccurate, use dice loss. If there is severe class imbalance, better alternatives are also available.

To fix your problem and stabilize training, you could mix dice loss with cross-entropy by taking their (possible weighted) average.

Please reach out to me if you have other questions.

JelleVE · August 23, 2022, 8:55am

Thank you very much for taking the time to reply, that was very insightful!