#1

I’m looking at the `basic sgd` sheet in `graddesc.xlsm`.

In cell F4 (err^2), the error function is defined by `(y-y prediction)^2`.

Writing this out as a formula, this would give `(y-(b+ax))^2`.

Taking the derivative with respect to b would then give, `-2(ax+b-y)` instead of `2(ax+b-y)` (which is what’s in the worksheet). This would flip the sign of the `de/db` value.

I guess `(y-(b+ax))^2` can be converted into `((b+ax)-y)^2` by just multiplying by -1, but I’m just wondering, was there any reason for this? Or was it just to make the formula easier to differentiate?

Apologies if I made some mistake with my math

Edit: Maybe to simplify my question – it seems that the error function uses `([actual y]-[predicted y])^2`, but when taking the derivative of the function, it’s assumed that the formula is `([predicted y]-[actual y])^2`. I know that the order of subtraction doesn’t matter for the actual error result since we’re taking the square, but it does seem to affect the sign of one of the partial derivatives, so I’m wondering why this discrepancy exists.

Wiki: Lesson 5
#2

As you said, the order of subtraction doesn’t matter, and it doesn’t affect the partial derivative result either.

Step-by-step we have (where z is y_pred):

(y - z)^2
y^2 -2yz + z^2
y^2 -2y(ax+b) + (ax+b)^2
y^2 -2yax -2yb +a^2x^2 +2axb +b^2

Then we differentiate with respect to b:
-2y +2ax +2b = 2(ax + b - y)

You can also see there is no difference if we consider (z - y)^2 instead:

(z - y)^2
z^2 -2zy +y^2
(ax+b)^2 -2(ax+b)y + y^2
a^2x^2 +2axb +b^2 -2axy -2by + y^2

Then we differentiate with respect to b:
2ax +2b -2y = 2(ax + b - y)