This is a really interesting paper [1] that hasn’t caught on yet [2]. The idea is that dual formula for mutual information allows you to use a NN to compute it. The idea is then used to fix GAN mode-dropping. Could be very cool… both as a general m.i. tool and for GANs.
Compared to covariance, which measures the extent of a linear relationship, Mutual Information (MI) is a measurement that doesn’t assume a particular kind of function between the variables. It is the special case of the Kullback-Leibler divergence that compares the joint distribution p(x,y) with the product of marginals p(x)p(y).
For discrete variables, MI is typically calculated by I(X ; Y) = \sum_{x, y} p(x,y) \log \frac{p(x,y)}{p(x)p(y)}
Then, they use a dual formula (i.e. variational formula) called the Donsker-Varadhan representation. This is expressed as an optimization problem, where you maximize this