**a** : is the weight tensor (that also stores its gradient in **a.grad** ) that our model will determine during the training of our model (y = x @ a)

**a.grad** is an attribute of the **a** tensor where the gradient of **a** is stored

a.grad is calculated after each call of the **loss.backward()** function

Then **a** is update like this

**a = a - lr * a.grad**

which can be written like this

**a -= lr * a.grad**

And in pytorch, it is written like this

**a.sub_(lr*a.grad)**

itโs called in-place sub() because it directly updates the **a** tensor in-place

(by the way, if you see a function that ends with `_`

like sub_(), it means itโs an in-place version of its correspondent function (like add and add_) : Itโs a convention)

Once we finish updating the a tensor, we have to reset **a.grad** to zero ( **a.grad.zero_()** ) before calling the next loss.backward() function.

As for the **with torch.no_grad():** , we use it to ask pytorch to stop updating (tracking) **a.grad** (itโs already calculated after the loos.backward() call) while we are updating the **a** tensor