Can anyone tell where the formula for gradient came from which is used while writing NMF from scratch?

All the NMF demonstrations in this lecture use a common and consistent energy functional given by : E = (1/2) * Trace [ (R.T @ R) + lamda * (mu - H).T @ (mu - H) + lamda * ( mu - W ).T @ (mu - W) ] where lamda = 0, wherever H, W > mu.

The first term is the squared Frobenius norm of the residual matrix, R = M - WH. The second and third terms are quadratic regularizers which in the case of ill-posed problems, depending upon the size of lamda, restrict the solutions from taking on undesirable values. (In this case, that condition is negative values of W and H elements)

Taking partial derivatives of this energy functional w.r.t. H, W lead to the gradient expressions computed explicitly and used in the python notebook.

Why do we use ‘energy functional’ instead of derivatives of Frobenius norm?

Thanks for the information. Keep suggesting such post.