?: Choosing input dimensions

I’m using a set of images that have different dimensions. When defining an architecture from scratch (i.e. not fine-tuning), I need to choose the input dimensions. How can I decide which image dimensions to use for the input dimensions?

I’m asking with respect to the first part of Jeremy’s general method for deep learning:
“1. Use intuition. 2. Experiment.”
I’d like to improve my intuition / reasoning for this question.

Here’s a histogram of the max image dimension of the images I’m using:

My current intuition says:

  • Min feels like too much information would be lost
  • Max seems good, but maybe that much padding would decrease performance
  • Median and mode seem like a way to overfit
  • Mean also seems like a way to overfit, although I’d be surprised if it were as bad as median and mode

(the above thoughts weren’t based on experiments)

My current conclusion:

  • Go with max

My operational conclusion:

  • Just try 'em all.

Does anyone have any thoughts or tips on choosing input dimensions?

Going with max led me to memory errors: Trainable params: 106,968,523

I’m going with mean for now, which allows me to train: Trainable params: 3,379,019

I’d go with max, since these are small anyway. Keras will stretch the images rather than add padding.

1 Like

A CNN needn’t have more params for larger inputs. Try a VGG-style network (see the vgg source for details)

Are you referring to fine-tuning when you suggest a VGG-style network?

I’m struggling to make the number of network parameters independent of input size. I’ve copied the code from vgg16.py and here are my results:

224x224 input: Trainable parameters: 138,357,544
52x52 input: Trainable parameters: 37,694,248

My understanding is this:

For each local receptive field there exists an additional neuron per filter. By increasing the size of the input, one increases the number of local receptive fields, and so the number of neurons, and so the number of parameters.

The parameter difference is from the Dense layers at the end, not the convolutional layers.

If you use nothing but convolutional layers with a global average layer at the end, you actually don’t have to specify image size at all.

Keras allows spatial dimensions to be None, but as far as I know this will only work if images in each batch are all the same size. So you could either bin your images or use batch size 1.

FYI this is just a simple example network, unlikely to be useful. Also I’m using TF dim ordering, not TH, i.e. channels as last axis

from keras.models import Model
from keras.layers import Input, Conv2D, GlobalAveragePooling2D
img_input = Input(shape=(None, None, 3))

x = Conv2D(32, 3, 3, border_mode='same')(img_input)
x = MaxPooling2D((2,2))(x)
x = Conv2D(64, 3, 3, border_mode='same')(x)
x = MaxPooling2D((2,2))(x)
x = Conv2D(128, 3, 3, border_mode='same')(x)
output = GlobalAveragePooling2D()(x)

model = Model(input=img_input, output=output)
print(model.summary())
'''
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_12 (InputLayer)            (None, None, None, 3) 0                                            
____________________________________________________________________________________________________
convolution2d_17 (Convolution2D) (None, None, None, 32 896         input_12[0][0]                   
____________________________________________________________________________________________________
maxpooling2d_10 (MaxPooling2D)   (None, None, None, 32 0           convolution2d_17[0][0]           
____________________________________________________________________________________________________
convolution2d_18 (Convolution2D) (None, None, None, 64 18496       maxpooling2d_10[0][0]            
____________________________________________________________________________________________________
maxpooling2d_11 (MaxPooling2D)   (None, None, None, 64 0           convolution2d_18[0][0]           
____________________________________________________________________________________________________
convolution2d_19 (Convolution2D) (None, None, None, 12 73856       maxpooling2d_11[0][0]            
____________________________________________________________________________________________________
globalaveragepooling2d_1 (Global (None, 128)           0           convolution2d_19[0][0]           
====================================================================================================
Total params: 93,248
Trainable params: 93,248
Non-trainable params: 0

'''

Output will be (None, 128) and it will accept images of any size.

N.B. You won’t be able to use the default keras image preprocessing objects unless you tweak them a bit.

1 Like

My understanding was wrong. Because of weight sharing, it’s not true that increasing the number of neurons has to increase the number of parameters (weights). @davecg is right when he says:

Thank you, David.