I started reading the paper that Jeremy and Terence wrote on Matrix Calculus for DL last night but am having a problem understanding the following:

In section 4.2 where it covers derivatives of scalar-vector products there’s a line that says:

"We know that element-wise operations imply that f_t is purely a function of w_i and g_i is purely a function of x_i." I am very confused about this claim. It might be explained in the text already but I don’t think I understand why that has to be true? Since, f is function of multiple parameters (that we all group into the w vector) why can we not have f_1 = w_1 * w_2?

In section 4.4 it talks about summing over the vector of functions i.e. sum(f(x)) but then later on it says “Let’s look at the gradient of the simple y = sum(x). The function inside the summation is just f_i(x) = x_i” and also “f_i(zx) = zx_i.”

I don’t quite understand why is f_i again only dependent on one of the components of x?

Hey, thanks for the reply. Could you elaborate a little bit on how that implies what the paper is saying? Like can you expand on why “we are considering only the element-wise operation.” leads to "f_i(x) = x_i” and also “f_i(zx) = zx_i.”

Not related to your questions, but a useful tip for writing out math equations so they’re easier to read: if you surround your equations with a single $ symbol, the equation will be rendered to look proper.

For example, $y = mx + c$ will be rendered as y = mx + c.