Running the CAMVID notebook locally against a 1080Ti GPU and getting
nan reported for the validation loss at every epoch. I reduced the LR by 10x and was able to get things working, but I would like to know …
1) Why does this occur?
2) What are the recommended steps to deal with it?
It means your LR is too high, and you need to reduce it (Make sure you have the latest fastai)
Yah I have the latest …
I also found that reducing the batch size remedies this issue as well.
Huh - that’s odd. And interesting.
This is your moment @wgpubs Time to do some research! You might find something really cool if you can replicate the results and show those to everyone.
Hi Mr. Howard, in lesson2 of course-v3, when i follow the example code in that notebook, either using high LR or low LR, i can get the #na# valid_loss…and therefore i can’t get the valid_loss curve
Reducing the batch size from 16 to 8 got the validation loss back. Not sure if this is a bug in fastai.
I found the same question as above. Would U please tell me why did it happen?
I found the reason Jeremy pointed before. He said the main reason was your set of the learning rate. Don’t assign it a too high value! Otherwise, the ball may bump into another world and never come back —— NaN!
Hi @hitgszf, thank you for your response. I am facing a similar problem. Do you mind explaining where I could update the learning rate?
lr_find does not use the validation data. So validation loss Will be NaN. Don’t worry about it. Validation loss should not be NaN during training.
Hey @PalaashAgrawal, thank you! Have a good day!