Image to image translation

kc.puttagunta · April 24, 2021, 9:32am

Hi everyone! I am working on an application which requires translating an image from a domain (A) into a different domain (say B). Models from the translation family usually help transfer styles between domains where underlying structure is similar between both domains (Cycle GAN and pix2pix etc. to convert horses to zebras)

But my need is to perform this translation where most of the styling remains similar but the content is restructured (reordered?). As an example, the task would be similar to transforming an image of a stationary horse to one that is (seemingly) galloping. As you can see, the content is the same, but transformed into a different setting (or posture etc.)

Can someone suggest if it is feasible to achieve this objective using neural networks - preferably in an unordered image collections context (unpaired)? Thanks in advance!

ilovescience · April 24, 2021, 11:28pm

This is likely possible… it reminds me of some of the problems where you change certain attributes of a person’s face (ex: making a person’s face smile), and some approaches with unpaired image-to-image translation have been developed.

I would recommend you to try with a CycleGAN (maybe using my library for the task ) and if it doesn’t succeed, I would recommend looking into more complicated models. Attention-based GAN frameworks like AttentionGAN, SPA-GAN, etc. are probably the best suited for this task: they try to determine which parts of the image need to be adjusted and only adjust that portion.

kc.puttagunta · April 25, 2021, 5:44am

Thanks for the response! yeah I have tried to solve this using a CycleGAN but it doesn’t seem to work - maybe my application contradicts the underlying assumption the CycleGAN operates with, that the image structure remains the same between domains and only style is the variable (winter to summer, stripes to plain etc.). I have done this on a smaller dataset and the output made me think that I am forcing the GAN to do something it’s not built to do.

My task involves more structural changes than merely modifying an attribute (like, normal to smiling face). A good example for what I need is to convert an image of a car to a Transformer (from the movie!). Here, all elements of the output image are borrowed from the source image/domain, but massively re/oriented/restructured. This is my concern while considering even the attention based GANs - that if this level of transformation can be achieved.

At this point, I am not sure if GANs are a solution to this at all. because even in a paired setting, generator output doesn’t necessarily match the ground truth.

I will try out the attention based GANs and if you can think of other methods that can solve this task, please share.

Thank you!