I am studying the part of Stochastic Gradient Descent in chapter 4 of the book, playing around with .grad. One thing comes up with surprise which I would like to have someone help me to clarify. When I applied `w -= w.grad * lr`

it affect the original tensor and if I replaced it with `w = w - (w.grad * lr)`

it won’t affect the original one. Why is that?

The codes are as the following:

-= example

`a = tensor(1.).requires_grad_()`

`a.grad = tensor(0.9)`

`a.grad`

`>tensor(0.9000)`

`b = a.grad`

`b -= b * 0.1`

`b`

`>tensor(0.8100)`

`a.grad`

`>tensor(0.8100)`

non-= example

`a = tensor(1.).requires_grad_()`

`a.grad = tensor(0.9)`

`a.grad`

`>tensor(0.9000)`

`b = a.grad`

`b = b - (b * 0.1)`

`b`

`>tensor(0.8100)`

`a.grad`

`>tensor(0.9000)`

shinhongwu