In the gradient descent intro we have:
def upd(): global a_guess, b_guess # make a prediction using the current weights y_pred = lin(a_guess, b_guess, x) # calculate the derivate of the loss dydb = 2 * (y_pred - y) dyda = x*dydb # update our weights by moving in direction of steepest descent a_guess -= lr*dyda.mean() b_guess -= lr*dydb.mean()
But I don’t understand fully
dyda, why does partial derivative with respect to a multiply x by dydb? We aren’t calcuating full derivative and I don’t think this is case of chain rule.
In the case of more than 2 dimensions, would we still be using
.mean()to decide greatest descent?