What are the recommended ways to deal with NaNs appearing in our validation loss?

wgpubs · November 10, 2018, 8:02pm

Running the CAMVID notebook locally against a 1080Ti GPU and getting nan reported for the validation loss at every epoch. I reduced the LR by 10x and was able to get things working, but I would like to know …

1) Why does this occur?

2) What are the recommended steps to deal with it?

jeremy · November 11, 2018, 2:08am

It means your LR is too high, and you need to reduce it (Make sure you have the latest fastai)

wgpubs · November 11, 2018, 2:24am

Yah I have the latest …

I also found that reducing the batch size remedies this issue as well.

jeremy · November 11, 2018, 2:25am

Huh - that’s odd. And interesting.

Lankinen · November 11, 2018, 7:40am

This is your moment @wgpubs Time to do some research! You might find something really cool if you can replicate the results and show those to everyone.

Karl_Mason · June 28, 2019, 9:13am

Hi Mr. Howard, in lesson2 of course-v3, when i follow the example code in that notebook, either using high LR or low LR, i can get the #na# valid_loss…and therefore i can’t get the valid_loss curve

tamilselvan · November 5, 2019, 9:56am

Reducing the batch size from 16 to 8 got the validation loss back. Not sure if this is a bug in fastai.

hitgszf · March 31, 2020, 3:37pm

I found the same question as above. Would U please tell me why did it happen?

hitgszf · April 8, 2020, 1:14pm

I found the reason Jeremy pointed before. He said the main reason was your set of the learning rate. Don’t assign it a too high value! Otherwise, the ball may bump into another world and never come back —— NaN!

GrigorijSchleifer · July 28, 2020, 7:44pm

Hi @hitgszf, thank you for your response. I am facing a similar problem. Do you mind explaining where I could update the learning rate?

Thank you!

PalaashAgrawal · July 28, 2020, 7:47pm

lr_find does not use the validation data. So validation loss Will be NaN. Don’t worry about it. Validation loss should not be NaN during training.

GrigorijSchleifer · July 29, 2020, 7:09pm

Hey @PalaashAgrawal, thank you! Have a good day!