Lesson 9 NLL loss problem


these are my functions…i have used the same code as in the original notebook but somehow my nll is coming inf… I have no idea why this is happening.

Hello,

If one of the probabilities that are computed via the softmax function is 0, because the logarithm of 0 is negative infinity, your loss would diverge to infinity. Could you please check what the minimum value in sm_pred is (i.e., sm_pred.min())?

Please note that in the nll function, there is negation happening, and therefore negative infinity would turn into infinity.

Thank you.

yeah, the min value of sm_pred is -inf

Hello,

Since the minimum value of sm_pred is negative infinity, that would indicate the magnitudes of the values in your logits (that is, pred) are excessively large. Because I am not aware of your full training pipeline, I can’t precisely pin down the source of this problem, however, based on personal experience, a common culprit for such an issue is too high a learning rate.

If you’d like, you could share more information regarding your code, and I’d be happy to help where I can.

2 Likes

this is the Model class
m = 784,
n = 33600

Hello,

Are you receiving negative infinity immediately after initializing your network or after training it? Also, please use nn.ModuleList rather than Python’s standard list inside the Model class, and replace __call__ with forward.

1 Like

Notebook
This is the notebook from the course that i am reproducing… this is the first instance of the Model class… nn.ModuleList is used later in the notebook, that solves this problem. But i want to know why is this happening when i am using the exact code.

Hello,

That’s strange; you are running the notebook line by line and getting this issue? I executed it myself and encountered no such problem. Could you restart the notebook and run everything in order again?

Thank you.

The only difference between the original notebook and my notebook is that i am using the dataset from the kaggle competition digit-recognizer. I am hoping the problem is not because of the dataset?

Hello,

Apologies for the late reply. The dataset itself is not to blame per se, however, an issue might be the scale of the data. Could you please print out a random sample from the dataset, as well as its range (e.g., print(data.min(), data.max()))? I ask because it’s possible that the data falls in the range [0, 255], whereas PyTorch expects it to be normalized by a factor of 255.

Thank you.

2 Likes


Yeah i guess that is the problem, I’ll try by normalizing it.

1 Like


yupp you were right…normalizing solved the issue…thank you so much man.

1 Like