Neural networks to transform images

kc.puttagunta · April 28, 2021, 3:15pm

Is there a way to setup a neural network to transform an image from one geometrical orientation to a different one, using a paired dataset?
the constraint is that the input and output need to be mapped one to one (and this will likely rule out the adversarial setups where generator only ever receives indirect feedback and doesn’t map to the ground truth during inference)

I am wondering if a typical network used for classification be used to output an image - pixel values (instead of class likelihood) - can a single network be used in such a way? and it receives direct feedback through a loss function that compares the output image (of the network) with the paired image that corresponds to the input image.

JackByte · April 29, 2021, 7:38pm

Maybe with a combination of a (Cycle)Gan and a Siamese network where image A and image B are showing the same object in different geometrical orientations. And a generator is using image A to build gen-image A, and the discriminator is comparing gen-image A with image B.

At least thats what I can imagine currently I don’t have the skill to build such a model.

ste · April 30, 2021, 10:32am

If your goal is to train a network that find image transformation, you can create a model that predicts the transformation (ie 2x3 matrix for fulls 2d affine, or predict 3 parameters rotationZ, translationX, translationY and generate the 2x3 matrix by yourself in order to ensure “rigid” transformation).
Then apply this transformation to the original image and compute loss.

See

kc.puttagunta · May 20, 2021, 3:38pm

thanks Stefano. this looks like a good solution but my specific task may need more complex transformation of the source image. I have left you a personal message with more details. if you can take a quick look and share your opinion, that’d be great.

thanks again,
KC

kc.puttagunta · May 20, 2021, 4:38pm

hey thanks for the response but these GANs transfer only styles between different domains and are not suitable to my task which needs specific transformation of source content - think of it like using the image of a car to draw a Transformer (like Bumblebee from the movies).

-KC