Please let me know if my understanding is correct

We **requires_grad= True** for **a** is because we want to optimize this parameter.

We **loss.backward()** because we want to minimize loss to find **a**

thanks

a = torch.tensor([-5.,5.],requires_grad=True)

def update():

#calculate y_hat

y_hat = x@a

#calculate loss

loss = mse(y, y_hat)

#print loss every 10 loops

if t % 10 == 0: print(loss)

#compute the derivatives, you can call .backward()

loss.backward()

# To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad():.

with torch.no_grad():

a.sub_(lr * a.grad) # w(t) = w(t-1) - lr dL/dw(t-1)

a.grad.zero_()