Anyone tried resolutions larger than 224x224 for training?


I’m working on a classification task where the input images for the different classes are very similar and only differ in very small areas. Original images are of size 900x1200.

As such, resizing to 224x224 for ResNet transfer learning does not seem adequate for the network to pick up the signals at these small areas.

There’s another thread that discusses about cropping out tiles from the original image for training:

However, the small features can occur anywhere so it is hard to automate the cropping to these areas. I was hoping to just use the whole image and let the network learn those small features automatically during training.

I’m wondering if anyone has tried higher resolutions without cropping out tiles, for example 448x448. Would this even work, and if so, would the training just take so long and so much resources that it’s not worth doing it?

It’s just a matter of having enough GPU memory, but you can reduce your batch size to help compensate.

GTX 2080/2080Ti RTX for Deep Learning? has some benchmarks, but I top out at 320x320

Thanks for the reply Ralph!

I managed to do a 560x560 in the end, took up approximately 15GB of VRAM. Results turned out much better due to the higher resolution.