Im reviewing SGD and I can not understand…

This subtracts the coefficients by (learning rate * gradient)…

he says : If I move the whole thing upwards, the loss goes down. so I want to do the opposite of the thing that makes it go up. We want our loss to be small. That’s why we subtract

Even so, what did he mean by moving everything upwards and having less loss?

def update(): 
  y_hat = x@a  
  loss = mse(y_hat, y) 
  if t % 10 == 0: print (loss)
  with torch.no_grad(): 
    a.sub_(lr * a.grad) 


I’m not sure, but since initialized a_1 is -1 and our target a_1 is 3, so I think upwards means make a_1 value bigger so that it approached to 3.
If you just draw a line, change of slope -1 -> 3 seems like going upwards.

You can refer to this

towards the center of the page.

At the current pic we need to minus to reach minima
and if we’re at the LHS of the curve, we need to plus to reach global minima.