Hi, I’m currently working on a project involving building footprint detection (binary segmentation) from aerial images. I have a working implementation in which I use the default loss function (CrossEntropyLossFlat) which yields decent results.
Now I have read about DiceLoss and how it is supposed to be a good loss function for these types of problems. I have done a number of testruns, and I find myself struggling with these loss function. Often, especially during the first few epochs, it gives blank predictions (all background), in other cases it does the reverse (all footprint). And other times I get very good results.
Is this loss function typically very sensitive to the learning rate (moreso than CrossEntropyLossFlat), or is there something that I am missing?
Some implementation details:
- I’m using exactly the same code for both DiceLoss and CrossEntropyLossFlat, only with different loss functions.
- In order to automate things a bit more, I’m always using the ‘slide’ learning rate from the lr_finder.
- I’m doing progressive resizing in which the best model of the previous size is reused.