In the gradient descent intro we have:
def upd():
global a_guess, b_guess
# make a prediction using the current weights
y_pred = lin(a_guess, b_guess, x)
# calculate the derivate of the loss
dydb = 2 * (y_pred  y)
dyda = x*dydb
# update our weights by moving in direction of steepest descent
a_guess = lr*dyda.mean()
b_guess = lr*dydb.mean()

But I don’t understand fully
dyda
, why does partial derivative with respect to a multiply x by dydb? We aren’t calcuating full derivative and I don’t think this is case of chain rule. 
In the case of more than 2 dimensions, would we still be using
.mean()
to decide greatest descent?