I am studying the part of Stochastic Gradient Descent in chapter 4 of the book, playing around with .grad. One thing comes up with surprise which I would like to have someone help me to clarify. When I applied w -= w.grad * lr
it affect the original tensor and if I replaced it with w = w - (w.grad * lr)
it won’t affect the original one. Why is that?
The codes are as the following:
-= example
a = tensor(1.).requires_grad_()
a.grad = tensor(0.9)
a.grad
>tensor(0.9000)
b = a.grad
b -= b * 0.1
b
>tensor(0.8100)
a.grad
>tensor(0.8100)
non-= example
a = tensor(1.).requires_grad_()
a.grad = tensor(0.9)
a.grad
>tensor(0.9000)
b = a.grad
b = b - (b * 0.1)
b
>tensor(0.8100)
a.grad
>tensor(0.9000)
shinhongwu