Hello!
Here’s the rundown. I have included what I deem to be the necessary for understanding.
I have a linear model using PyTorch’s nn.Linear
class.
linear_model = nn.Linear(28*28, 1)
I have a custom optimizer class.
class Optimizer:
"""A simple attempt to model an optimizer."""
def __init__(self, parameters, learning_rate):
"""Initialize optimizer attributes."""
self.parameters = list(parameters)
self.learning_rate = learning_rate
def step(self):
"""Update the parameters."""
for parameter in self.parameters:
parameter.data -= parameter.data.grad * self.learning_rate
def zero_gradient(self):
"""Reset the gradient of each parameter."""
for parameter in self.parameters:
parameter.grad = None
And have instantiated this class as such:
optimizer = Optimizer(linear_model.parameters(), 1)
The gradients of this model is calculated by the following function.
def calculate_gradients(x_batch, y_batch, model):
predictions = model(x_batch)
loss = l1_norm(predictions, y_batch)
loss.backward()
l1_norm
is defined as such.
def l1_norm(predictions, targets):
predictions = predictions.sigmoid()
return torch.where(targets==1, 1-predictions, predictions).mean()
When I run the model, on the very first epoch a runtime error is thrown:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 train_model(linear_model, 20)
Input In [16], in train_model(model, epochs)
1 def train_model(model, epochs):
2 for epoch in range(epochs):
----> 3 train_epoch(model)
4 print(valid_epoch(model), end=', ')
Input In [15], in train_epoch(model)
2 for x_batch, y_batch in train_dataloader:
3 calculate_gradients(x_batch, y_batch, model)
----> 4 optimizer.step()
5 optimizer.zero_gradient()
Input In [9], in Optimizer.step(self)
10 """Update the parameters."""
11 for parameter in self.parameters:
---> 12 parameter.data -= parameter.data.grad * self.learning_rate
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
What I’ve gathered is that, despite loss.backward
being run in calculate_gradients
, parameter.data.grad
remains of None
type and hence cannot be multiplied with self.learning_rate
.
Does anyone have an idea as to why loss.backward
is keeping parameter.data.grad
as None
? nn.Linear
does make sure requires_grad
is set to True
so I’m not quite sure what’s going wrong.
I would really appreciate any help! Please let me know if I’m still ambiguous or if I should supply any more information.