What's the difference between` a -= b` and `a = a-b`?

I always thought thought a -= b is exactly same as a = a-b , however I found out if I do something like this

for l in model.layers:
                if hasattr(l,'weight'):
                    l.weight = l.weight - lr * l.weight.grad

I will get an error
"TypeError: cannot assign 'torch.FloatTensor' as parameter 'weight' (torch.nn.Parameter or None expected)"
In my code, l is a nn.module and l.weight is torch.nn.Parameter and l.weight.grad * lr is torch.nn so I understand where this error comes from

What surprises me is if we rewrite it with -= (like below) - it works fine!

for l in model.layers:
                if hasattr(l,'weight'):
                    l.weight -= lr * l.weight.grad

I want to dig in a little bit deeper to understand what cause this difference however I don’t know where to start; according to Python doc these two things should be exactly the same; if anyone know anything, I would really curious to know

Thanks

1 Like

Try writing it like this:

l.weight = l.weight - (lr * l.weight.grad)

tried and it is the same -

Out of curiosity, why would you think it will make a difference? I always thought a - b * c would be same as a - (b*c)

I guess they are the same, is it possible to share a small code that reproduces this error. It becomes a lot easier to debug.

sure - I created an small code example below, basically when we are trying to update the weight attribute of an nn.module object, it will throw a TypeError if the new value is not an nn.Parameter; for some reason we can get around this when we do -= or +=


import torch
m = torch.nn.Linear(10,1)

# this won't work - will get a TypeError, weight attribte of an ann.Module has to be a Parameter 
with torch.no_grad():
    m.weight = m.weight + torch.randn(m.weight.shape)

# However when we use += we can get around with the TypeError, below code works fine
with torch.no_grad():
    m.weight += torch.randn(m.weight.shape)

Basically the error mentions that the weight has to be of type nn.Parameter not some FloatTensor. Converting the output to a type of Parameter makes the code a+b work. It has to be of type Parameter as it will also store gradients. The below code works.

with torch.no_grad():
    m.weight = nn.Parameter(m.weight + torch.randn(m.weight.shape))

Why this works?

with torch.no_grad():
    m.weight += torch.randn(m.weight.shape)

My guess is, this function is overridden somewhere in the torch code to do the auto casting. I could not find the source code for the same. If I can get it, I will update it here.

1 Like

Most generally the difference is a += b is in-place operation while a = a+b is out of place operation. So translating to pytorch the former is a.add_(b) and latter a.add(b). I would dig in that direction to understand the underlying error.

3 Likes

Thanks everyone, so here is what I’ve learnt so far:

  1. a += b is NOT syntax sugar of a = a + b : one mutate the object in-place while the other makes a copy; this is not just unique for Pytorch, you can find the same for python list
  2. Normally when you add nn.Parameter and a regular torch.tensor together you will get a regular tensor; However with in-place operation, you only modify the .data of the tensor so the type of object will not change
a = nn.Parameter(torch.tensor([1.,2.]))
b = torch.tensor([3.,4.]) 
a = a+b

a is a torch.tensor in above code

a = nn.Parameter(torch.tensor([1.,2.]))
b = torch.tensor([3.,4.]) 
a += b

type(a) returns nn.parameter

2 Likes

one more thing I just learnt that’s relevant - when you update an attribute in-place, it will not triggle the __setattr__ method at all; therefore even the types aren’t correct, you will not get an exception