I was watching lesson 8 and then https://explained.ai/matrix-calculus/index.html, but one thing is still not clear to me, lin_grad function.

My understanding of backprop is that it’s an iterative approach of calculating partial derivatives. Let’s say we have `f = mse(relu(x@w + b))`

. We can replace functions and say

u = x@w + b

v = relu(u)

z = mse(v)

As well, `df/dx = df/dz * dz/dv * dv/du * du/dx`

. As we run backprop, after

mse_grad, we will have `inp.g = dz/dv`

, after relu_grad `inp.g = dz/dv*dv/du`

and after lin_grad `inp.g = dz/dv*dv/du*du/dx`

. Is anything wrong so far?

**d/db**

d(lin_grad)/db = 1 (vector). I get the part where we multiply that 1 with `out.g`

, but why do we have to do `.sum(0)`

? Is that because vector is broadcasted on dimension 0 when doing the forward pass?

**d/dw**

I do understand that we have to perform `out.g(some_operation)x`

, but don’t fully understand whether it should be * or @. I know that I can’t do *, but I’d like to know some theoretical understanding instead of trying to make dimensions match.

**d/dx**

This component is completely unclear to me. My expectation is that it should be `out.g(some_operation)*w.t`

. Again, I get that dimensions don’t allow that, but I’d like some theoretical understanding of why is that.

It is entirely possible that all answers are in https://explained.ai/matrix-calculus/index.html#sec4.3, but I am still unable to connect the dots.