where preds is the output of the softmax, omitting the log, and y is a one hot tensor, I can’t understand why it uses the probability of the target, I mean, if pred is [[0.9, 0.1]], 90% probability that input is a 3, and 10% probability that input is a 7, and y is [[1, 0]], meaning that the target is a 3, the resulting loss is -0.9?

The softmax says that the model is 90% confident that the input is a 3, and the loss is -0.9? More is the probability more is the loss? I’m confused

Thanks @hno2, I second-guessing myself just now. and I’m under the impression that because the softmax output is an array of numbers from 0 to 1, and the logarithm from 0 to 1 is [-inf, 1], than the more is the probability the less is the loss, because, taking the above example, if the probability is 0.9, than the loss is -ln(0.9) = 0.1, if the probability was 0.05, than the loss would have been -ln(0.05) = 2.99, so wrong probability = higher loss

The key is the logarithm, isn’t it?
Please say yes xD

It transforms an high probability to a lower loss, and it transforms a low probability to a higher loss.

Hi Manuel. The key is not the log because log is is an increasing function and cannot affect the ordering. If there is any key, it is the negative, as @hno2 said above.

In your example, suppose the prediction is [[0.95, 0.1]], a higher confidence that the three is correct. liklihood() gives -.95. This is a lower loss than -.9, just as it should be.