I have seen that fastai also allows, very easily to reshape images to a rectangular shape. I was wondering then, why do we in the course always use square images? Is there a computational reason behind?
Moreover, we tend also to use more or less always the same sizes (e.g.224 or 320). Is there a reason behind?
Mostly convention. A seminal paper (alexnet) used 224 in its model arch diagram, because it fit neatly with that models layers and kernel, stride, padding sizes. IIRC there was some mixup with 227. Subsequent models tried to beat it and kept value the same (well why not?). Seems to work well as a size. Conventions arise in surprisingly sticky ways, look at audio and 44.1khz.
Oh, I see. So in our case, since there is a fully connected layer at the end, input image size should not matter that much right? so most likely is the same 224224 that 220240 and so on…
Image size does matter specifically for time. To make the best use of the CUDA cores, image sizes and everything should be divisible by 8 (this was one way that Jeremy et al managed to bring down the time to train nlp models a few months ago).
the highest I’ve ever personally gone is 448x448