Input image dimensions

Hi @jeremy, @rachel,

I see that in VGG16, input image dimension is set to 224x224. Here are my questions regarding it:

  1. How is an image truncated/ padded if it has different dimensions? Any historic evidence on which one is worse?
  2. Intuitively it seems like we should try to minimize truncation/padding by setting the dimensions to the mean of images?
  3. For the redux competition, I did some analysis of input images and here are the stats:
    Dogs(mean,std)=(array([ 399.3507, 374.5622]), array([ 107.4896, 100.5482])),
    Cats(Mean,Std)=(array([ 400.6467, 370.3812]), array([ 108.2237, 99.597 ]))

Clearly seems like we can do better than 224x224. But I am curious about the standard deviation portion and understand how we can minimize the loss of data (truncation) / noise (padding).

Thanks,

Also in the vgg_preprocess, I see vgg_mean = [123.68, 116.779, 103.939]. I cant seem to figure out where these numbers came from?

In the mnist example I see a different Lambda function to preprocess input:
def norm_input(x): return (x-mean_px)/std_px

Would be interested in intuitively understanding why this is a good thing(http://www.d.umn.edu/~deoka001/Normalization.html)

Answers to your 1st question are here: Position, Size, Orientation, and Image Dimension

These numbers are the average pixel value of each channel in the imagenet dataset. They are provided by the VGG authors.

There are some pointers here: http://stats.stackexchange.com/questions/41704/how-and-why-do-normalization-and-feature-scaling-work

Thanks Jeremy!

So to summarize my understanding so far:

  1. The reason most networks use 224x224 is that Imagenet uses that dimensions
  2. If image is >224x224, keras down samples it.
  3. If image is < 224, keras streches it

Seems like how we resize the images can have a non trivial impact on accuracy. I am trying to find code blocks where this down sample and stretch is implemented in Keras. Is it this one: https://github.com/fchollet/keras/blob/master/keras/layers/core.py#L255

With respect to Lambda function, do you recommend always using vgg_preprocess when reusing/retraining vgg model?

Yes, except that if it’s too big, it doesn’t just downsample it, it “squishes” it if it’s not already square. This as a big impact on performance, and probably isn’t a good idea - the original VGG authors instead added black borders to make it square, which I think is better. I’ve started working on adding this to keras.

No, that’s the Reshape layer: https://keras.io/layers/core/#reshape . The code that does the resizing is in keras.preprocessing.image.ImageDataGenerator

hi Jeremy,
What if I need to work with image of size 50X50. Do I need to train a network from scratch?
Please let me know.

Thank you,

I found answer in Lecture 7 video.
Thank you,