On Cycle GAN loss function

A random thought on the cycloGAN loss function as I was watching the lecture video, the loss is broken into three pieces, two GAN losses (one for each pair of generator and discriminator) and one cyclo loss. My question focuses on the cyclo loss, instead of using e.g., the l1-norm of ||F(G(x)) - x||, why not using something that also contains the discriminator, e.g., the log of D(F(G(x))), the argument being intuitively, the l1-norm loss will minimize the difference from all pixels in the image x, that means you also enforce the background to be the same as the original image after one cycle from horse to zebra to horse, which should not be ideal. I notice that in the example images, e.g., the animation of zebra vs horse, the background color of the generated zebra animation looks yellowish, very different from the background in the original horse animation. Would that be the reason leading to this? Whereas if you use a loss that involves the D, it only requires the generator to fool the discriminator, which supposedly should learn only to focus on the object itself.

1 Like

But don’t we want the background to stay consistent?

This also bugged me. If we are transfering objects between domains, that is, If we wanted our horse to be a zebra, shouldn’t we then focus on the object, disregarding the background, or rather, keeping the background the same in the original and translation.

Rather than forcing the network to learn to disregard the background, an object detector/instance segmentation that focuses on the object to translate should make the cyclegan training faster.

that’s fair point, but as a learning objective, I am curious to see if we only focus on the object itself what will happen.