SRGAN: How to adapt the model to the input image?

I wrote and trained my own SRGAN: so I obtained a generator’s model that takes 32x32 images as input and gives their improved 128x128 version as output…

However, the end users of my Android app will send images of any size, 3800x2800, 53x12, etc.

How can I run my SRGAN on such images: Should I change the generator’s training to take images inputs with any dimensions into account (differing from the original SRGAN research paper)? Or can I change the shape dimensions of the input layer of the model on the fly?

Note: https://deepai.org/machine-learning-model/torch-srgan - they actually did it! I don’t know how…

1 Like

Well, it seems like you could take the input image and cut it up into 32x32 sections and super-res each section to 128x128 and stitch them back together into the final image.

1 Like

Thank you for your help.

Cutting the input in 32x32 patches

Two problems exist with this approach:

  1. Are we sure the SRGAN can actually super-res a patch? Indeed, it’s a CNN network, thus it’s trained to recognize patterns. If we cut the image into several 32x32 patches, maybe a lot of patterns would be cut and, thus, unrecognizable. So the SRGAN would not be able to super-res them. Also, it’d not be able to super-res them for other reasons, due to this cutting too.

  2. Considering the 1.'s answer is YES…: How can we deal with images that can’t be cut into 32x32 patches exactly? Imagine the simplest case: an image which is 33x32. We’d cut it into a patch of 32x32 and another patch of 1x32. The first one could be super-res, but not the last one. Of course it is a problem with only 1px but in real examples, this problem would appear with more than 1px.

Manually extending the input image to 32x32

This is an alternative solution I thought of. Imagine the Android app’s user can only send images whose width is <= 32, and whose height is <= 32.
Then the app add black rows and cols of pixels around the (<=32 ; <=32) input image in order to send to the SRGAN a 32x32 image.

But the SRGAN doesn’t seem to appreciate it. This solution doesn’t work obviously.
Indeed, I’ve tried it. I’ve trained my SRGAN on a set of only 1 image and at each epoch, I’ve output a test’s result. The test was realized with the black-cols-and-rows version of this training image. Result: the SRGAN wasn’t able to recognize that the training image was inside these blacks cols and rows. It could not super-res it. Then I used the training image also for the test (after having deleted the black-rows-and-cols version): the SRGAN could actually super-res it: so my SRGAN doesn’t bug (i.e.: the problem doesn’t come from it).

Finally…

What should I do? Are you really sure cutting it into 32x32 would work? How to deal with the problem I’ve written about this approach?
Is there any other solution? Do you have any idea of how DeepAI did it (the Website is given in my post)?

Thank you again for your help!

I doubt I can be of any help and may be totally clueless but here are some random thoughts. Most of this or all of it may be complete garbage.

For #1, like Jeremy says many times. Why not give it a try and report back. It shouldn’t be too hard to take a 64x64 image and cut it into 4 32x32 images and super-res each, then put them together to see what happens. Just thinking about it from another perspective, if your end user gives you a 32x32 image to super-res, there is no way of knowing that the image is the whole image or a cropped portion of an image. The algorithm should be able to super-res it regardless. Now, there may be some issues or artifacts at the boundaries when putting the images back together that need to be dealt with but let’s think about #2 for a moment.

For #2, let’s assume #1 works. Then if you were to super-res the upper-left most 32x32 region, you get the 128x128 upper-left most super-res version. Let’s say then that you shift over 1 pixel column and super-res that 32x32 section. I would think that this 128x128 super-res version would match very closely to the first 128x128 super-res for most pixels except for the area where the first pixel column was removed and for the area where the new pixel column was added on the other side. Shifting over 2 pixels should have output similar to the 1 pixel shifted version which is similar to the original and so on. So my point is is that if you are given a 33x32 pixel image as you described in your example above, I wouldn’t create a final slice that is 1x32, but instead would take the 32x32 portion that includes the 1x32 region. This region overlaps most of what was computed in the first 32x32 pass but would also now include the final 4x96 portion of the image needed. The same type of process could be used for any image that doesn’t divide evenly by 32. I’m not sure if I’ve explained it very clearly.

If artifacts do appear at the boundaries where the images are spliced together then a similar approach could possibly be taken where a 32x32 section that puts the boundary in the middle is taken and that output is used as sort of a blending factor or a complete replacement for the boundary area.

1 Like

Thank you! Yes I’ll try this. :slight_smile:

Hope this helps. Let me know how it goes.