Why do we use square images in fastai course?

mgloria · August 19, 2019, 4:00pm

I have seen that fastai also allows, very easily to reshape images to a rectangular shape. I was wondering then, why do we in the course always use square images? Is there a computational reason behind?

Moreover, we tend also to use more or less always the same sizes (e.g.224 or 320). Is there a reason behind?

digitalspecialists · August 19, 2019, 6:11pm

Mostly convention. A seminal paper (alexnet) used 224 in its model arch diagram, because it fit neatly with that models layers and kernel, stride, padding sizes. IIRC there was some mixup with 227. Subsequent models tried to beat it and kept value the same (well why not?). Seems to work well as a size. Conventions arise in surprisingly sticky ways, look at audio and 44.1khz.

(I think)

mgloria · August 19, 2019, 7:38pm

Oh, I see. So in our case, since there is a fully connected layer at the end, input image size should not matter that much right? so most likely is the same 224224 that 220240 and so on…

muellerzr · August 19, 2019, 7:53pm

Image size does matter specifically for time. To make the best use of the CUDA cores, image sizes and everything should be divisible by 8 (this was one way that Jeremy et al managed to bring down the time to train nlp models a few months ago).
the highest I’ve ever personally gone is 448x448

mgloria · August 19, 2019, 8:26pm

thanks @muellerzr this is a very interesting detail. Where can I read about this core? I usually use k80 or v100 gpu.

muellerzr · August 19, 2019, 8:34pm

CUDA cores are what GPU’s are run on and allow faster processing.

mgloria · August 19, 2019, 10:24pm

I still do not see where the magic number ‘8’ comes from though… such a mistery!

muellerzr · August 19, 2019, 10:32pm

I’ll see if I can find an article tonight but I believe it’s 8 cores per thread or something like that. Mabye someone else can pitch in

But I believe it was something Jeremy and them found just works.

muellerzr · August 20, 2019, 12:40pm

@mgloria Mixed precision training