I am going through the Jeremy’s paper about the algebra behind back propagation and I am struggling to get his point on the derivation of element-wise operations. I assume I am missing something, rather than the paper being confused, but I cannot put my hand on what.
So in the paragraph “Derivatives of vector element-wise binary operators”, once the jacobian (J) of the element wise operation of 2 vectors F and G, we look for a condition where the J is diagonal.
The argument comes as:
" We know that element-wise operations imply that fi is purely a function of wi and gi is purely a function of xi. For example, sums . Consequently, reduces to "
What puzzles me is that actually here the element-wise operation is made between F and G, not between W and X which are the parameters of F and G. So the element wise operation between F and G does not imply at all that fi depends only on xi. if Y = F O G then yi depends only on fi and gi ok. But that can not be repercuted to F or G’s paremeters.
Actually seems to be an abuse of notation since the element-wise operator involves two scalars. Those scalars can of course depend on the vectors W and X as long as we respect the fact the element wise operation between F and G involves only Fi with Gi.
Finally, the fact that we ask under which condition do we have a diagonal Jacobian matrix, and that “by definition of the elemnt_wise operation” we actually have directly this diagonal, means that actually there are no condition needed. So the logic is weird.
Has anyone understood what this is supposed to mean ?