How to calculate mutual information with a NN + an application to GAN mode-dropping

This is a really interesting paper [1] that hasn’t caught on yet [2]. The idea is that dual formula for mutual information allows you to use a NN to compute it. The idea is then used to fix GAN mode-dropping. Could be very cool… both as a general m.i. tool and for GANs.

(1) Mutual Information Neural Estimation


Compared to covariance, which measures the extent of a linear relationship, Mutual Information (MI) is a measurement that doesn’t assume a particular kind of function between the variables. It is the special case of the Kullback-Leibler divergence that compares the joint distribution p(x,y) with the product of marginals p(x)p(y).

For discrete variables, MI is typically calculated by
I(X ; Y) = \sum_{x, y} p(x,y) \log \frac{p(x,y)}{p(x)p(y)}

Then, they use a dual formula (i.e. variational formula) called the Donsker-Varadhan representation. This is expressed as an optimization problem, where you maximize this

\sum_x T(x)p(x) - \sum_y e^{T(y)}q(y),

by varying over functions T.

So now you can imagine that there’s a neural net to represent the function T

Neat, that looks really cool! Thanks for posting it, I’m checking out the paper now.

By the way, in case anyone would like some more, uh, information, I think this is a fantastic introduction to information theory, entropy, mutual information, etc.: