Why does VGG use 224x224 images?

Why did the folks behind VGG16 use 224x224 pixel images ?

It seems like an odd choice compared to, say, 256x256.

I’ve read the paper (https://arxiv.org/pdf/1409.1556.pdf) but don’t see that choice explained.

Is it determined by working back from the dimensions of later layers?

Objects often appear in the middle of an image in the ImageNet dataset. After 5 maxpools, a 224x224 will be 7x7 meaning that it will have a centerpoint. A 256x256 image will be 8x8 and not have a distinct centerpoint. There might be more, but this is what I remember.

2 Likes