I am currently working on an image segmentation task, where my input images are between 80x80 to 100x100 pixels and my target area sometimes measures only a few pixels. Currently, I am experimenting with different segmentation models, especially different U-Nets.
During down sampling the resolution is continually reduced, so that after the fourth down sampling step the resolution of the feature maps is only 10x10 (80x80 input) or 13x13 (100x100 input). So, information about the target areas might be lost in the lowest level features.
Is there any information available about the minimal recommended input size for U-Nets or on how to most effectively reduce downsampling?
I’m no expert on U-Nets but I think the skip connections in the architecture take care of the problem that you are downsampling an image to a very small grid size before upsamping again. In order to avoid the information loss from downsampling, the network can access higher resolution activations (left part of the U) during the upsampling process (right part of the U). But I have no answer to the question how far you can push the downsampling in terms of grid size.
The fastai book includes a short but clear explanation of this process in chapter 15.
Also, the following is mentioned: “One challenge with U-Nets is that the exact architecture depends on the image size. fastai has a unique DynamicUnet class that autogenerates an architecture of the right size based on the data provided.”
So the fastai implementation of DynamicUnet should automatically find the right architecture depending on your image size. Did you try that?
Hey Stefan, thanks for your answer.
I was not aware that DynamicUnet generates an architecture depending on the input size. I cannot use DynamicUnet as my input is 3D, but will dive into the application and try to copy the approach.