Hi there,
I’m trying to implement a NN for the complete MNIST set as suggested at the end for chapter 4.
I’ve almost done, but I’ve a problem with the last layer of the model, the F.softmax method.
Sometimes the output tensor from softmax contains NaN (not a number), while debugging I’ve seen that the input tensor for the softmax contains very large values, and the exponential inside the softmax transform those values to Infinite, and the final resulting value is NaN
For example, first epoch, first line of input tensor for softmax
tensor([[1.1537e-10, 3.7890e-33, 1.0000e+00, …, 6.8583e-22, 1.9325e-11,
2.3996e-06], etc.], grad_fn=)
the output is
tensor([[nan, nan, nan, …, nan, nan, nan], etc.])
I’ve resolved by writing my own softmax implementation:
def softmax(preds):
temperature = 90
ex = torch.exp(preds/temperature)
return ex / torch.sum(ex, axis=0)
The key point I think is the temperature, I’ve set it to 90 because I’ve seen that the highest value in preds is 90 more or less, i think it acts like, i don’t know, it smooths the input preds…
BUT WHY I had to write my own softmax and F.softmax didn’t work for me?
The accuracy using F.softmax
0.0997
0.0997
0.0997
0.0997
…
0.0097
The accuracy using my own softmax
0.2326
0.3363
0.4134
0.4734
…
0.7333
0.7431
0.7548