You can look at this pytorch tutorial to get a better idea of how the gradient computation is handled in pytorch.

We also did something similar in class for style transfer where the input image was iterated upon. I don’t fully follow that code and I made a simpler example below.

You’ll have to change a few things in your code:

You will have to shadow inputs = V(inputs, requires_grad=True).

You must call mean.backward(), where mean is your final loss, for each gradient step. It does the full backprop gradient computation. It stores the computed gradient within each tensor as the .grad attribute.

It is very important that you call inputs.grad.zero_() at some point inside your stochastic gradient loop. Otherwise the gradients will add up over the iterations.

Here is a linear regression example, but instead of fitting the parameters of the linear equation, I modify the input x values to get the best fit possible:

from fastai.core import V
import torch
x = V(torch.tensor([0.4, 0.8, 2.2, 3.5]), requires_grad=True)
# x = torch.tensor([0.4, 0.8, 2.2, 3.5], requires_grad=True) # works in pytorch 0.4.1
a = torch.tensor([2.])
b = torch.tensor([1.])
def model(x_): return a * x_ + b
y_target = torch.tensor([1., 3., 5., 7.]) # constant
lr = 0.1
for i in range(0, 10):
y = model(x)
# Would be better to use torch.nn.MSELoss(), but this is equivalent:
loss = ((y - y_target) ** 2).mean()
print(f"\niter {i} loss: {loss}")
loss.backward()
print(f"x.grad: {x.grad}")
# one gradient step
x.data.sub_(lr * x.grad)
print(f"x: {x.data}")
# important to reset the gradient or they accumulate
x.grad.zero_()

Thank you for your reply. Following your advice, I did the following:

.
.
.
learner.model.reset()
total_loss = 0.
for inputs, targets in tqdm(tst_dl):
outputs, raws, outs = learner.model(V(inputs))
p_vocab = F.softmax(outputs, 1)
inputs = V(inputs, requires_grad=True)
for i, pv in enumerate(p_vocab):
targ_pred = pv[targets[i]]
loss = torch.log(targ_pred)
total_loss -= loss
loss.backward()
print(inputs.grad)
inputs.grad.zero_()
mean = total_loss / (bptt * len(tst_dl))

loss.backward() because I want the gradient of loss corresponding to each <inputs, targets> pair. But this doesn’t work. Instead, I get this error:

File ".../3.6.4/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File ".../3.6.4/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: element 0 of variables tuple is volatile

I’m sorry but I just realized I had not really looked at what you were trying to accomplish when I answered this question originally. I had only glanced that you were trying to get the gradient on the input and I immediately – and wrongly – assumed that you were trying to do something similar to the style transfer example on pictures from class.

I now think the bug you got is irrelevant because I’m not sure what you are trying to do is possible at all. But I don’t really know what you are trying to accomplish. Your inputs are words which are encoded as integers (categories): ie cat is 42 and car is 13. You can’t take the gradient on such categorical data.

However those indices are mapped to embeddings so that the neural net can deal with them. Taking the gradient on the embedding is possible and it is actually done automatically when the network is not frozen.