How to calculate derivative of matrix with respect to other matrix dZ/dW?

@Jeremy @parrt I have fundamental question on the Matrix calculus paper: the paper addresses derivative of vectors w.r.t vectors (Jacobin matrix) but it didn’t show how we could do derivative of matrix w.r.t to Matrix which is needed in neural networks gradient calculations.

for neural network hidden layer with multiple neurons the equations is Z=WX+B(1^T), where Z, W, X and B are matrices and not vectors. W dimensions =n(l) x n(l-1) where n(l) is number of neurons, and n(l-1) are number of neurons of previous layer.

for gradient w.r.t parameters how to calculate dZ/dW? what is even the meaning of div(matrix)/div(matrix)

appreciate if someone could shed some light.