Confused about NLL in Jeremy's "what is torch.nn, really?" blog post

Hi everyone,

I’m currently reading

Jeremy writes negative log likelihood in python the following:

def nll(input, target):
    return -input[range(target.shape[0]), target].mean()

However, I see no notion of log in this python code. I believe this is because Jeremy uses this function with the log_softmax, and not the softmax:

def log_softmax(x):
    return x - x.exp().sum(-1).log().unsqueeze(-1)

Could anyone tell me whether I understood if correctly ?
ie: “negative_likelihood”(log_softmax) == negative_log_likelihood(softmax) ?

I suppose I am confused because of the function’s name, which if I am correct should probably be named otherwise.

Thank you for your time

1 Like

a bit hair-splitting, but I think they are the same :wink:
nll(log(softmax())) == nll(log_softmax) == F.crossentropy
your version == jeremy’s version == pytorch version


Yes, I find it a little bit confusing. In the code, there is only ever ONE log operation. So, when you say you are applying the “negative log-likelihood” function after applying the “log softmax” function, it sounds like there will be TWO log operations. The log, however, is only in the “log softmax”.

On the other hand, in this article, for example, the negative log-likelihood function is applied to the softmax, not the “log softmax”. So here the log is in the “negative log-likelihood”.

Ooops, after working through a little bit more of the notebook, it appears that the log has to be bundled with the softmax, in order to reduce the number of exp operations. That is,

def log_softmax(x): return (x.exp()/(x.exp().sum(-1,keepdim=True))).log()

is refactored into this:

def log_softmax(x): return x - x.exp().sum(-1,keepdim=True).log()

using some log identities. If you bundled the log with the nll function, then you would not be able to do this.