I’m currently reading https://pytorch.org/tutorials/beginner/nn_tutorial.html
Jeremy writes negative log likelihood in python the following:
def nll(input, target):
return -input[range(target.shape), target].mean()
However, I see no notion of log in this python code. I believe this is because Jeremy uses this function with the log_softmax, and not the softmax:
return x - x.exp().sum(-1).log().unsqueeze(-1)
Could anyone tell me whether I understood if correctly ?
ie: “negative_likelihood”(log_softmax) == negative_log_likelihood(softmax) ?
I suppose I am confused because of the function’s name, which if I am correct should probably be named otherwise.
Thank you for your time
a bit hair-splitting, but I think they are the same
your version == jeremy’s version == pytorch version
Yes, I find it a little bit confusing. In the code, there is only ever ONE log operation. So, when you say you are applying the “negative log-likelihood” function after applying the “log softmax” function, it sounds like there will be TWO log operations. The log, however, is only in the “log softmax”.
On the other hand, in this article, for example, the negative log-likelihood function is applied to the softmax, not the “log softmax”. So here the log is in the “negative log-likelihood”.
Ooops, after working through a little bit more of the notebook, it appears that the
log has to be bundled with the
softmax, in order to reduce the number of
exp operations. That is,
def log_softmax(x): return (x.exp()/(x.exp().sum(-1,keepdim=True))).log()
is refactored into this:
def log_softmax(x): return x - x.exp().sum(-1,keepdim=True).log()
using some log identities. If you bundled the log with the
nll function, then you would not be able to do this.