I see that in VGG16, input image dimension is set to 224x224. Here are my questions regarding it:
How is an image truncated/ padded if it has different dimensions? Any historic evidence on which one is worse?
Intuitively it seems like we should try to minimize truncation/padding by setting the dimensions to the mean of images?
For the redux competition, I did some analysis of input images and here are the stats:
Dogs(mean,std)=(array([ 399.3507, 374.5622]), array([ 107.4896, 100.5482])),
Cats(Mean,Std)=(array([ 400.6467, 370.3812]), array([ 108.2237, 99.597 ]))
Clearly seems like we can do better than 224x224. But I am curious about the standard deviation portion and understand how we can minimize the loss of data (truncation) / noise (padding).
Yes, except that if it’s too big, it doesn’t just downsample it, it “squishes” it if it’s not already square. This as a big impact on performance, and probably isn’t a good idea - the original VGG authors instead added black borders to make it square, which I think is better. I’ve started working on adding this to keras.