Why does VGG use 224x224 images?

craic · February 15, 2018, 4:27pm

Why did the folks behind VGG16 use 224x224 pixel images ?

It seems like an odd choice compared to, say, 256x256.

I’ve read the paper (https://arxiv.org/pdf/1409.1556.pdf) but don’t see that choice explained.

Is it determined by working back from the dimensions of later layers?

NathanYee · February 15, 2018, 6:39pm

Objects often appear in the middle of an image in the ImageNet dataset. After 5 maxpools, a 224x224 will be 7x7 meaning that it will have a centerpoint. A 256x256 image will be 8x8 and not have a distinct centerpoint. There might be more, but this is what I remember.