Reasons for NAN loss on CamVid/Tiramisu/Keras?

clu2033 · June 6, 2017, 1:51am

When training the Tiramisu on CamVid dataset, the model starts training with nan loss. How does one go about determining the cause of the nan value? I think either the data input might be suspect or Keras 2 fit_generator is the issue.

I checked the input and none of the values appear to be NaN. Even training on one batch leads to NaN loss.

yashkatariya · June 6, 2017, 3:28pm

Your gradients are exploding. Try using tanh as the activation function.

clu2033 · June 9, 2017, 3:30am

The issue was much simpler actually, not having the correct number of classes. If the input is valid and hyper-parameters are moderate, then there is a mistake or bug somewhere.

DavideBoschetto · August 4, 2017, 2:00pm

Hey there. Any chance you could point me to how to practically fix NaN losses on Camvid/Tiramisu?
How did you notice about the number of classes not being correct?

Edit: the label_colors file contained 32 classes, while the tiramisu was expecting 12. Just for posterity, that’s the thing to fix!