Lesson 2 : gradient descent

Please let me know if my understanding is correct

We requires_grad= True for a is because we want to optimize this parameter.
We loss.backward() because we want to minimize loss to find a


a = torch.tensor([-5.,5.],requires_grad=True)
def update():
#calculate y_hat
y_hat = x@a
#calculate loss
loss = mse(y, y_hat)
#print loss every 10 loops
if t % 10 == 0: print(loss)
#compute the derivatives, you can call .backward()
# To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad():.
with torch.no_grad():
a.sub_(lr * a.grad) # w(t) = w(t-1) - lr dL/dw(t-1)