With "-=" or without is a big difference. why?

shinhongwu · December 27, 2020, 3:51am

I am studying the part of Stochastic Gradient Descent in chapter 4 of the book, playing around with .grad. One thing comes up with surprise which I would like to have someone help me to clarify. When I applied w -= w.grad * lr it affect the original tensor and if I replaced it with w = w - (w.grad * lr) it won’t affect the original one. Why is that?

The codes are as the following:

-= example
a = tensor(1.).requires_grad_()
a.grad = tensor(0.9)
a.grad
>tensor(0.9000)
b = a.grad
b -= b * 0.1
b
>tensor(0.8100)
a.grad
>tensor(0.8100)

non-= example
a = tensor(1.).requires_grad_()
a.grad = tensor(0.9)
a.grad
>tensor(0.9000)
b = a.grad
b = b - (b * 0.1)
b
>tensor(0.8100)
a.grad
>tensor(0.9000)

shinhongwu

Pomo · December 27, 2020, 4:53am

In the second case, b = b - (b * 0.1) creates a second tensor that is no longer connected to a. In the first case, b is a reference to a.grad., and changing it in place alters a.grad.

You must think of Python variables as references (pointers) to a specific thing in memory.

HTH, Malcolm

shinhongwu · December 27, 2020, 5:50am

Thanks for your clarification, Malcolm.

shinhongwu