How can both generator and discriminator losses decrease?

Hadus · June 7, 2018, 6:58pm

Oh wow! I understand I think! The output of the “discriminator” in WGANs doesn’t have a sigmoid or tanh activation because it isn’t a classifier! In WGANs the “discriminator” does something else. I will look it up now.

This means that the analogy is incorrect. And I still don’t know what they measured in Figure 2

The paper words it in such a way that I think I might be wrong again…

The middle and right panels show the the discriminator loss and generator loss versus training steps. Using WGAN and WGAN-GP objectives, the discriminator loss decreases monotonically, correlating perfectly with training progress, consistent with the claim in Arjovsky et al. (2017b) and Gulrajani et al. (2017). In contrast, the widely used GAN objective leads to oscillating discriminator loss, and therefore does not effectively reflect training progress. We conclude that the WGAN and WGAN-GP objectives honor promises to correlate well with training progress, which is vital for hyperparameter tuning, detecting overfitting or simply deciding when to terminate training.

What is the discriminator loss in WGAN? because it surely isn’t what I am printing
I do print out what my optimizer is minimizing for the discriminator (disc_loss). Same as in WGAN-GP except with an extra consistency term added on.

Thus the “discriminator” is not a direct critic of telling the fake samples apart from the real ones anymore. Instead, it is trained to learn a K-Lipschitz continuous function to help compute Wasserstein distance. As the loss function decreases in the training, the Wasserstein distance gets smaller and the generator model’s output grows closer to the real data distribution.