I always thought thought a -= b is exactly same as a = a-b , however I found out if I do something like this
for l in model.layers:
if hasattr(l,'weight'):
l.weight = l.weight - lr * l.weight.grad
I will get an error
"TypeError: cannot assign 'torch.FloatTensor' as parameter 'weight' (torch.nn.Parameter or None expected)"
In my code, l is a nn.module and l.weight is torch.nn.Parameter and l.weight.grad * lr is torch.nn so I understand where this error comes from
What surprises me is if we rewrite it with -= (like below) - it works fine!
for l in model.layers:
if hasattr(l,'weight'):
l.weight -= lr * l.weight.grad
I want to dig in a little bit deeper to understand what cause this difference however I don’t know where to start; according to Python doc these two things should be exactly the same; if anyone know anything, I would really curious to know
sure - I created an small code example below, basically when we are trying to update the weight attribute of an nn.module object, it will throw a TypeError if the new value is not an nn.Parameter; for some reason we can get around this when we do -= or +=
import torch
m = torch.nn.Linear(10,1)
# this won't work - will get a TypeError, weight attribte of an ann.Module has to be a Parameter
with torch.no_grad():
m.weight = m.weight + torch.randn(m.weight.shape)
# However when we use += we can get around with the TypeError, below code works fine
with torch.no_grad():
m.weight += torch.randn(m.weight.shape)
Basically the error mentions that the weight has to be of type nn.Parameter not some FloatTensor. Converting the output to a type of Parameter makes the code a+b work. It has to be of type Parameter as it will also store gradients. The below code works.
with torch.no_grad():
m.weight = nn.Parameter(m.weight + torch.randn(m.weight.shape))
Why this works?
with torch.no_grad():
m.weight += torch.randn(m.weight.shape)
My guess is, this function is overridden somewhere in the torch code to do the auto casting. I could not find the source code for the same. If I can get it, I will update it here.
Most generally the difference is a += b is in-place operation while a = a+b is out of place operation. So translating to pytorch the former is a.add_(b) and latter a.add(b). I would dig in that direction to understand the underlying error.
Thanks everyone, so here is what I’ve learnt so far:
a += b is NOT syntax sugar of a = a + b : one mutate the object in-place while the other makes a copy; this is not just unique for Pytorch, you can find the same for python list
Normally when you add nn.Parameter and a regular torch.tensor together you will get a regular tensor; However with in-place operation, you only modify the .data of the tensor so the type of object will not change
a = nn.Parameter(torch.tensor([1.,2.]))
b = torch.tensor([3.,4.])
a = a+b
a is a torch.tensor in above code
a = nn.Parameter(torch.tensor([1.,2.]))
b = torch.tensor([3.,4.])
a += b
one more thing I just learnt that’s relevant - when you update an attribute in-place, it will not triggle the __setattr__ method at all; therefore even the types aren’t correct, you will not get an exception