Can anyone tell where the formula for gradient came from which is used while writing NMF from scratch?
All the NMF demonstrations in this lecture use a common and consistent energy functional given by : E = (1/2) * Trace [ (R.T @ R) + lamda * (mu - H).T @ (mu - H) + lamda * ( mu - W ).T @ (mu - W) ] where lamda = 0, wherever H, W > mu.
The first term is the squared Frobenius norm of the residual matrix, R = M - WH. The second and third terms are quadratic regularizers which in the case of ill-posed problems, depending upon the size of lamda, restrict the solutions from taking on undesirable values. (In this case, that condition is negative values of W and H elements)
Taking partial derivatives of this energy functional w.r.t. H, W lead to the gradient expressions computed explicitly and used in the python notebook.
Why do we use ‘energy functional’ instead of derivatives of Frobenius norm?