Why log likelihood uses the probability of the target?

originof · March 13, 2021, 10:49pm

Given the log likelihood loss

def likelihood(preds, y):
   return -torch.sum(y * preds)

where preds is the output of the softmax, omitting the log, and y is a one hot tensor, I can’t understand why it uses the probability of the target, I mean, if pred is [[0.9, 0.1]], 90% probability that input is a 3, and 10% probability that input is a 7, and y is [[1, 0]], meaning that the target is a 3, the resulting loss is -0.9?

The softmax says that the model is 90% confident that the input is a 3, and the loss is -0.9? More is the probability more is the loss? I’m confused

hno2 · March 14, 2021, 9:21am

Let me try to give you some hint, based on the problems objective:

Generally our goal/objective is to have the maximal probability of an (correct) output.
What happens when we do it negative?
- We maximize the probability by minimizing the negative likelihood

So by wrapping the likelihood with (-1) we go from a Maximization Objective to a Minimization Problem.

originof · March 14, 2021, 12:58pm

Thanks @hno2, I second-guessing myself just now. and I’m under the impression that because the softmax output is an array of numbers from 0 to 1, and the logarithm from 0 to 1 is [-inf, 1], than the more is the probability the less is the loss, because, taking the above example, if the probability is 0.9, than the loss is -ln(0.9) = 0.1, if the probability was 0.05, than the loss would have been -ln(0.05) = 2.99, so wrong probability = higher loss

The key is the logarithm, isn’t it?
Please say yes xD

It transforms an high probability to a lower loss, and it transforms a low probability to a higher loss.

Pomo · March 14, 2021, 7:57pm

Hi Manuel. The key is not the log because log is is an increasing function and cannot affect the ordering. If there is any key, it is the negative, as @hno2 said above.

In your example, suppose the prediction is [[0.95, 0.1]], a higher confidence that the three is correct. liklihood() gives -.95. This is a lower loss than -.9, just as it should be.

HTH,