NLLLoss implementation


(Aki Rehn) #1

Hi.

NLLLoss was mentioned in couple lectures, but the implementation was never really explained.

Pytorch implementation leads to C -code:

I’m having really hard time grasping this concept. I think the implementation was skipped during lessons, but if I’m wrong about this, I would be really grateful for a link pointing to the video!

Also the wiki only mentions the mathematical formula for multi class log loss, but does not include anything about Python code:

http://wiki.fast.ai/index.php/Log_Loss#Multi-class_Classification

How would one actually implement this in Pytorch using plain Python?

Thanks in advance!


(Nick) #2

Hi. I also had a hard time grasping this, especially because of confusion between CrossEntropyLoss and NLLLoss. Not sure if implementation of negative log likelihood loss was ever explained in courses. In short - CrossEntropyLoss = LogSoftmax + NLLLoss. Here is a quick example with NLLLoss implemenation:

import torch
torch.manual_seed(1)

def NLLLoss(logs, targets):
    out = torch.zeros_like(targets, dtype=torch.float)
    for i in range(len(targets)):
        out[i] = logs[i][targets[i]]
    return -out.sum()/len(out)

x = torch.randn(3, 5)
y = torch.LongTensor([0, 1, 2])
cross_entropy_loss = torch.nn.CrossEntropyLoss()
log_softmax = torch.nn.LogSoftmax(dim=1)
x_log = log_softmax(x)

nll_loss = torch.nn.NLLLoss()
print("Torch CrossEntropyLoss: ", cross_entropy_loss(x, y))
print("Torch NLL loss: ", nll_loss(x_log, y))
print("Custom NLL loss: ", NLLLoss(x_log, y))
# Torch CrossEntropyLoss:  tensor(1.8739)
# Torch NLL loss:  tensor(1.8739)
# Custom NLL loss:  tensor(1.8739)

NLL loss also supports ‘reduce’ parameter which is equal to True by default. In this case it would be something like this:

def NLLLoss(logs, targets, reduce=True):
    out = torch.zeros_like(targets, dtype=torch.float)
    for i in range(len(targets)):
        out[i] = logs[i][targets[i]]
    return -(out.sum()/len(out) if reduce else out)

(Aki Rehn) #3

Thanks a million!

Works like a charm with soft LogSoftmax when I pasted it in my notebook. Maybe I was too hard trying to avoid using for -loops…

I’ll take really close look at your code later, but for now I just want to thank you very, very much!


(Aki Rehn) #4

Hi.

I think I was getting pretty close. Here’s your code with the for loop removed:

def NLLLoss(logs, targets):
    out = torch.diag(logs[:,targets])
    return -out.sum()/len(out)

And replacing the sum+division with torch.mean() makes it even a bit more cleaner.

def NLLLoss(logs, targets):
    out = torch.diag(logs[:,targets])
    return -torch.mean(out)

And by removing Log from output layer gives this hand-coded CrossEntropyLoss

def CrossEntropyLoss(logs, targets):
    out = torch.diag(logs[:,targets])
    return -torch.mean(torch.log(out))

Nice, I made some big advances in my understanding today! :slight_smile:

Thanks again!


(Nikhil Prabhu) #5

Here’s a solution for the code with for loop removed by simply indexing the same matrix

def NLLLoss(logs, targets):
    out = logs[range(len(targets)), targets]
    return -out.sum()/len(out)