# Why log likelihood uses the probability of the target?

Given the log likelihood loss

``````def likelihood(preds, y):
return -torch.sum(y * preds)
``````

where preds is the output of the softmax, omitting the log, and y is a one hot tensor, I can’t understand why it uses the probability of the target, I mean, if pred is [[0.9, 0.1]], 90% probability that input is a 3, and 10% probability that input is a 7, and y is [[1, 0]], meaning that the target is a 3, the resulting loss is -0.9?

The softmax says that the model is 90% confident that the input is a 3, and the loss is -0.9? More is the probability more is the loss? I’m confused

Let me try to give you some hint, based on the problems objective:

• Generally our goal/objective is to have the maximal probability of an (correct) output.
• What happens when we do it negative?
• We maximize the probability by minimizing the negative likelihood

So by wrapping the likelihood with (-1) we go from a Maximization Objective to a Minimization Problem.

Thanks @hno2, I second-guessing myself just now. and I’m under the impression that because the softmax output is an array of numbers from 0 to 1, and the logarithm from 0 to 1 is [-inf, 1], than the more is the probability the less is the loss, because, taking the above example, if the probability is 0.9, than the loss is -ln(0.9) = 0.1, if the probability was 0.05, than the loss would have been -ln(0.05) = 2.99, so wrong probability = higher loss

The key is the logarithm, isn’t it?