I’m working on a classification task where the input images for the different classes are very similar and only differ in very small areas. Original images are of size 900x1200.
As such, resizing to 224x224 for ResNet transfer learning does not seem adequate for the network to pick up the signals at these small areas.
There’s another thread that discusses about cropping out tiles from the original image for training:
However, the small features can occur anywhere so it is hard to automate the cropping to these areas. I was hoping to just use the whole image and let the network learn those small features automatically during training.
I’m wondering if anyone has tried higher resolutions without cropping out tiles, for example 448x448. Would this even work, and if so, would the training just take so long and so much resources that it’s not worth doing it?