Gan training for noise to image vs style transfer

joedockrill · August 2, 2020, 4:22pm

if i want to train a gan to do complete image generation (like for example these pokemon generators) i assume i use random noise inputs and actual pokemon as targets (or whatever, there are enough pokemon already).

if i then fed that trained gan a photo of an actual thing would it do some kind of style transfer-ish transformation?

for “proper” style transfer i’ve seen people talking about training on only one image. do i just really just train with noise and one target style image or is it more complicated?

i don’t want to do a hundred epochs and find i was pointing in the wrong direction the whole time. is there a lesson on this i’m not seeing?

muellerzr · August 2, 2020, 4:26pm

It’s more complicated. You proposed a two-step process, which is what that would be. Style transfer is in of itself only done based off one input image (at least the implementations involving fastai). A good example would be: “Make my photo look more like one of Picasso’s paintings.” Wheras with your approach it would be “Make me a random Pokemon, and then make other pokemon look similar to that pokemon”. Does this make sense to you? In the latter we generate a new pokemon then use style transfer to make other pokemon similar to it. (Hopefully clears up a little confusion? Or mabye I’m just being redundant some )

joedockrill · August 2, 2020, 4:38pm

no i think i wasn’t being clear, i’m asking 3 different questions.

can i do “make me a pokemon out of thin air” with random noise inputs and a whole bunch of different pokemon targets (or maybe let’s say manga as that’s a little more consistent) and the same basic gan approach from lesson 7?
if i gave that gan a picture would it attempt to create something which looked like the input in the style of the outputs it undertstands, eg: turn people into manga avatars.
am i supposed to do style transfer with noise input and 1 style image target?

i guess i’m confused about the difference between 2 and 3 other than the specificity of the style being transfered, could you not make a van goch style transfer trained on all of his work rather than a stary night style transfer?

stefan-ai · August 2, 2020, 5:54pm

I’m no expert on GANs and it’s been a while since I’ve watched the Lesson 7 video. But I think for this kind of problem - where you have random pixels as input and some type of images (in your case pokemon or mangas) as output - you would use a Wasserstein GAN like in this notebook: course-v3/nbs/dl1/lesson7-wgan.ipynb at master · fastai/course-v3 · GitHub

On this one I can only speculate. Probably the GAN would still produce an image that looks like the type of images it has been trained on (e.g. pokemon or manga). But I don’t think it would preserve the structure of the input image of e.g. a portrait of a human, when it creates the output, because it hasn’t been trained on such task.

tomsthom · August 3, 2020, 8:46am

1: yes you can train a GAN to generate pokemon like images from random noise
2: No, the trained gan will need the same kind of input (gaussian noise for instance) to generate good images. If you feed it with data from a different distribution (ex: a real image of pokemon), you will not have a good result
3: for style transfer (with an approach using gram matrices), you can optimize a image (i.e. create a new image) with 1 target for the style and 1 target for the content

Atom-101 · August 6, 2020, 2:30am

Yes, you can generate pokemons from thin air, but the basic gan approach from lesson 7 won’t get you very far. It is a small model and simply too weak, compared to modern SOTA architectures. It is just meant to help you get an understanding of how a very basic GANs work.
You can check out my work here: https://github.com/Atom-101/PokeGAN. This is kind of how your results would look with a basic GAN.
You cannot directly give that gan a picture. This is simply because the GAN generator takes as input a low dimensional noise (your picture may be 256x256x3 but the gan input would be something like 512x1). You won’t be able to feed a picture into it but, what you can do is use a separate encoder to to encode your high dimensional real image into a low dimensional latent vector.
If you can find latent embeddings of real images, you can do style transfer using NVIDIA’s super popular Stylegan architecture. Stylegan can generate images out of thin air as well as do style transfer.