Input image dimensions

sravya8 · November 21, 2016, 9:59pm

I see that in VGG16, input image dimension is set to 224x224. Here are my questions regarding it:

How is an image truncated/ padded if it has different dimensions? Any historic evidence on which one is worse?
Intuitively it seems like we should try to minimize truncation/padding by setting the dimensions to the mean of images?
For the redux competition, I did some analysis of input images and here are the stats:
Dogs(mean,std)=(array([ 399.3507, 374.5622]), array([ 107.4896, 100.5482])),
Cats(Mean,Std)=(array([ 400.6467, 370.3812]), array([ 108.2237, 99.597 ]))

Clearly seems like we can do better than 224x224. But I am curious about the standard deviation portion and understand how we can minimize the loss of data (truncation) / noise (padding).

Thanks,

sravya8 · November 21, 2016, 10:22pm

Also in the vgg_preprocess, I see vgg_mean = [123.68, 116.779, 103.939]. I cant seem to figure out where these numbers came from?

sravya8 · November 21, 2016, 10:29pm

In the mnist example I see a different Lambda function to preprocess input:
def norm_input(x): return (x-mean_px)/std_px

Would be interested in intuitively understanding why this is a good thing(http://www.d.umn.edu/~deoka001/Normalization.html)

jeremy · November 21, 2016, 11:42pm

Answers to your 1st question are here: Position, Size, Orientation, and Image Dimension

jeremy · November 21, 2016, 11:45pm

These numbers are the average pixel value of each channel in the imagenet dataset. They are provided by the VGG authors.

jeremy · November 21, 2016, 11:46pm

There are some pointers here: http://stats.stackexchange.com/questions/41704/how-and-why-do-normalization-and-feature-scaling-work

sravya8 · November 22, 2016, 12:40am

Thanks Jeremy!

So to summarize my understanding so far:

The reason most networks use 224x224 is that Imagenet uses that dimensions
If image is >224x224, keras down samples it.
If image is < 224, keras streches it

Seems like how we resize the images can have a non trivial impact on accuracy. I am trying to find code blocks where this down sample and stretch is implemented in Keras. Is it this one: https://github.com/fchollet/keras/blob/master/keras/layers/core.py#L255

With respect to Lambda function, do you recommend always using vgg_preprocess when reusing/retraining vgg model?

jeremy · November 22, 2016, 12:58am

Yes, except that if it’s too big, it doesn’t just downsample it, it “squishes” it if it’s not already square. This as a big impact on performance, and probably isn’t a good idea - the original VGG authors instead added black borders to make it square, which I think is better. I’ve started working on adding this to keras.

jeremy · November 22, 2016, 12:59am

No, that’s the Reshape layer: https://keras.io/layers/core/#reshape . The code that does the resizing is in keras.preprocessing.image.ImageDataGenerator

Chandrak · February 26, 2017, 12:59am

hi Jeremy,
What if I need to work with image of size 50X50. Do I need to train a network from scratch?
Please let me know.

Thank you,

Chandrak · February 26, 2017, 9:28pm

I found answer in Lecture 7 video.
Thank you,