I would hugely appreciate if someone could explicitly walk through the CycleGAN training procedure. I am specifically curious about dealing with the minimax optimization problem.
It’s straightforward to train discriminators without updating the generator networks: just use .detach() so backprop stops at the end of the discriminator instead of carrying on into the generator.
However, training the generators is more complicated. You can use a gradient reversal layer and do everything at once, but that method seems to have fallen out of favor. Some people (e.g. https://arxiv.org/pdf/1711.03213.pdf) use an “inverted objective” method, which essentially means training the discriminator with the aforementioned .detach() method, and then training the generator by giving the discriminator the wrong ground truth. However, this seems pretty suboptimal for obvious reasons – you’re messing up the discriminator to train the generator.
Fastai seems to deal with this problem by… strategically setting requires_grad = False and .eval()? My basic understanding of backprop/pytorch is that gradients for later stages of a network need to be computed in order to compute gradients for the earlier stages. I.e., it seems invalid to set requires_grad = False on a discriminator and expect to backprop through it to a generator.
If anyone could explain these alternatives and their pros/cons, what specific settings are required to make each work, etc, that would be awesome. Thanks!