Computing gradient(s) with the fastai library

aayushy · July 26, 2018, 12:51pm

This is what I basically want to do during test/evaluation. Specifically, what I need is something like the snippet below:

    labels = Variable(torch.LongTensor([maxIndexTemp]).cuda(CUDA_DEVICE))
    loss = criterion(outputs, labels)
    loss.backward()
    
    
    # Normalizing the gradient to binary in {0, 1}
    gradient =  (torch.ge(inputs.grad.data, 0))
    gradient = (gradient.float() - 0.5) * 2

And this is what I am doing right now …

tst_dl = TextReader(tst_sent, bptt)
md = ModelData(dir_path, None, None, tst_dl)

learner = RNN_Learner(md, TextModel(m), opt_fn=opt_fn)

learn.load(fw_classifier)

learner.model.eval()
learner.model.reset()

total_loss = 0.

for inputs, targets in tqdm(tst_dl):
    outputs, raws, outs = learner.model(V(inputs))
    p_vocab = F.softmax(outputs,1)
    for i, pv in enumerate(p_vocab):
        targ_pred = pv[targets[i]]
        total_loss -= torch.log(targ_pred.detach())
mean = total_loss / (bptt * len(tst_dl))
print(f'Loss: {mean}, Perplexity: {np.exp(mean)}')

I really can not figure out how to (a) backprop the loss/criterion and then (b) compute the gradients w.r.t the input.

How does fastai library handle loss and gradients? I tried V(inputs, requires_grad=True).grad but all I get are None values.

Max333 · July 30, 2018, 8:22pm

You can look at this pytorch tutorial to get a better idea of how the gradient computation is handled in pytorch.

We also did something similar in class for style transfer where the input image was iterated upon. I don’t fully follow that code and I made a simpler example below.

You’ll have to change a few things in your code:

You will have to shadow inputs = V(inputs, requires_grad=True).
You must call mean.backward(), where mean is your final loss, for each gradient step. It does the full backprop gradient computation. It stores the computed gradient within each tensor as the .grad attribute.
It is very important that you call inputs.grad.zero_() at some point inside your stochastic gradient loop. Otherwise the gradients will add up over the iterations.

Here is a linear regression example, but instead of fitting the parameters of the linear equation, I modify the input x values to get the best fit possible:

from fastai.core import V
import torch

x = V(torch.tensor([0.4, 0.8, 2.2, 3.5]), requires_grad=True)
# x = torch.tensor([0.4, 0.8, 2.2, 3.5], requires_grad=True) # works in pytorch 0.4.1
a = torch.tensor([2.])
b = torch.tensor([1.])

def model(x_): return a * x_ + b

y_target = torch.tensor([1., 3., 5., 7.])  # constant

lr = 0.1

for i in range(0, 10):
    y = model(x)
    # Would be better to use torch.nn.MSELoss(), but this is equivalent:
    loss = ((y - y_target) ** 2).mean()
    print(f"\niter {i}    loss: {loss}")

    loss.backward()

    print(f"x.grad: {x.grad}")

    # one gradient step
    x.data.sub_(lr * x.grad)

    print(f"x: {x.data}")
    # important to reset the gradient or they accumulate
    x.grad.zero_()

aayushy · July 31, 2018, 11:54am

Thank you for your reply. Following your advice, I did the following:

.
.
.

learner.model.reset()

total_loss = 0.

for inputs, targets in tqdm(tst_dl):
    outputs, raws, outs = learner.model(V(inputs))
    p_vocab = F.softmax(outputs, 1)
    inputs = V(inputs, requires_grad=True)
    for i, pv in enumerate(p_vocab):
        targ_pred = pv[targets[i]]
        loss = torch.log(targ_pred)
        total_loss -= loss
        loss.backward()

        print(inputs.grad)
        inputs.grad.zero_()

mean = total_loss / (bptt * len(tst_dl))

loss.backward() because I want the gradient of loss corresponding to each <inputs, targets> pair. But this doesn’t work. Instead, I get this error:

  File ".../3.6.4/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File ".../3.6.4/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: element 0 of variables tuple is volatile

Max333 · July 31, 2018, 1:53pm

I’m sorry but I just realized I had not really looked at what you were trying to accomplish when I answered this question originally. I had only glanced that you were trying to get the gradient on the input and I immediately – and wrongly – assumed that you were trying to do something similar to the style transfer example on pictures from class.

I now think the bug you got is irrelevant because I’m not sure what you are trying to do is possible at all. But I don’t really know what you are trying to accomplish. Your inputs are words which are encoded as integers (categories): ie cat is 42 and car is 13. You can’t take the gradient on such categorical data.

However those indices are mapped to embeddings so that the neural net can deal with them. Taking the gradient on the embedding is possible and it is actually done automatically when the network is not frozen.

aayushy · August 3, 2018, 7:12am

Meaning that I should be able to save the gradients at train time? That should be good enough.

And the gradients can be fetched the same way as mentioned earlier, with x.grad, is that correct?

Max333 · August 3, 2018, 1:18pm

Yes, so it should be an easy task to get the updated embeddings, if everything else is already set up.