Why does the nll() has grad_fn while accuracy() doesn't?

In “03_minibatch_training.ipynb”, the loss function and the accuracy are both functions that take in the same inputs: the model output and the labels. The output of the loss function has the grad_fn attribute while the output of the accuracy of the loss function doesn’t. Why is this? Is it because it’s mathematically impossible to differentiate the operations inside the accuracy function?

Here is an extract that demonstrates this observation:

from exp.nb_02 import *

x_train, y_train, x_valid, y_valid = get_data()

n, m = x_train.shape
c = int(y_train.max() + 1)
nh = 50

class Model(nn.Module):
    def __init__(self, n_in, nh, n_out):
        self.layers = [nn.Linear(n_in, nh), nn.ReLU(), nn.Linear(nh, n_out)]
    def __call__(self, x): 
        for l in self.layers: x = l(x)
        return x

def logsumexp(x):
    m = x.max(-1)[0]
    return m + (x - m[:,None]).exp().sum(-1).log()
def log_softmax(x): return x - x.logsumexp(-1, keepdim=True)
def nll(inp, targ): return -inp[range(targ.shape[0]), targ].mean()
def cross_entropy(inp, targ): return nll(log_softmax(inp), targ)
def accuracy(out, yb):
    return (torch.argmax(out, dim=-1) == yb).float().mean()

model = Model(m, nh, c)
pred = model(x_train)
cross_entropy(pred, y_train).grad_fn, accuracy(pred, y_train).grad_fn

This returns:

(<NegBackward at 0x7f73d0fb2978>, None)

showing that the loss has a grad_fn of NegBackward, while the accuracy has None.

Accuracy is just a metric you measure and has, therefore, no gradient.
The loss is responsible for the gradient which is used for the optimization.

Usually, for a metric there is no differentiable form (otherwise you would use it as a loss). This is why need to use a differentiable loss function instead for the optimization.

1 Like

Ok yeah, it looks like argmax() has no derivative:

>> pred.argmax(dim=-1).grad_fn == None