Hey there,
I’m trying to reimplement the creation of a SGD MNIST digit classifier from lesson 4. I’m trying to do so without exactly copying the lesson code exactly and using PyTorch directly rather than via fastai. However, I’ve got stuck on a bug where grad
seems to be set to None
after I call loss.backward()
.
Here’s the error I get:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-22a0da261727> in <module>
23
24 with torch.no_grad():
---> 25 weights -= weights.grad * LR
26 bias -= bias * LR
27
TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'
If I run the code again, I get a different error, namely:
RuntimeError Traceback (most recent call last)
<ipython-input-25-455a55143419> in <module>
7 predictions = xb@weights + bias
8 loss = get_loss(predictions, yb)
----> 9 loss.backward()
10
11 with torch.no_grad():
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
116 products. Defaults to ``False``.
117 """
--> 118 torch.autograd.backward(self, gradient, retain_graph, create_graph)
119
120 def register_hook(self, hook):
/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
91 Variable._execution_engine.run_backward(
92 tensors, grad_tensors, retain_graph, create_graph,
---> 93 allow_unreachable=True) # allow_unreachable flag
94
95
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
And here’s what I think are the relevant parts of my code:
# Skipped DataLoader setup for brevity
def get_accuracy(precictions, actual):
return (precictions >= 0.5).float() == actual
def get_loss(predictions, actual):
normalised = predictions.sigmoid()
return torch.where(actual == IS_7, 1 - normalised, normalised).mean()
def init_params(size, variance=1.0):
return torch.randn(size, dtype=torch.float, requires_grad=True) * variance
weights = init_params((IMG_SIZE, 1))
bias = init_params(1)
for epoch in range(1):
# Iterate over dataset batches
# xb is a tensor with the independent variables for the batch (tensor of pixel values)
# yb "" dependent "" (which digit it is)
for xb, yb in dl:
print(xb.shape)
predictions = xb@weights + bias
loss = get_loss(predictions, yb)
loss.backward()
with torch.no_grad():
weights -= weights.grad * LR # <-- Error here: unsupported operand type(s) for *: 'NoneType' and 'float'
bias -= bias * LR
weights.grad.zero_()
bias.grad.zero_()
Some useful notes:
- I also tried to use
.data
instead ofwith torch.no_grad()
but that didn’t help.with
seems to be the preferred method from PyTorch (https://pytorch.org/tutorials/beginner/pytorch_with_examples.html) - Calling
@
for matrix multiplication in the predictions vstorch.mm
makes no difference. - I previously made a mistake with my tensor setup but I think that’s all fixed now.
weights.shape
,bias.shape
outputs(torch.Size([784, 1]), torch.Size([1]))