Vgg16 with different image resolutions

sorinpanduru · August 22, 2017, 7:38am

Hello,
I’ve been trying to use VGG16 to classify some fashion dataset and been somewhat successful when testing on very different categories (e.g. shoes vs jeans vs tshirts).
I noticed that in lesson 1, all images are resized to 224 x 224 pixels before feeding to the model for actual training.

I tried changing this values to the actual size of my images, but I had some errors.

Can someone explain why the image size is important and what should one do to change it ? I’d like to be able to use images in 2/3 aspect ratio.

Thanks!

noio · August 22, 2017, 7:44am

I’ve had success applying (a part of) VGG16 to images of a different size.

The thing is, in a Convolutional Layer, all weights are shared between kernels on each pixel of the image, so the actual image dimensions do not matter. Or to put it differently: a convolution is a translation-invariant operation, it does depend on where on the image it is applied.

That means you can freely resize the input to a Conv2D layer, and it’s output will similarly change. Of course, the original VGG16 pipes into a few Dense layers that do depend on input size, and they are also trained for a certain dimension.

You can also see that the ConvBlocks in Jeremy’s VGG code do not mention the input size anywhere:

model.add(Conv2D(filters, (3, 3), activation='relu'))

The only place where the input size is taken into account is in the very first layer that applies the initial image preprocessing:

model.add(Lambda(vgg_preprocess, input_shape=(3,224,224), output_shape=(3,224,224)))

Keras is smart, it propagates this shape to the rest of the model, but you are free to modify it. However, if you do, you can no longer use the pre-trained weights from the Fully Connected blocks, because they were trained to accept the dimensions that are output from the final Conv block. (7x7x512 if I am not mistaken)

sorinpanduru · August 22, 2017, 8:40am

Uhm… I’ve tried it again by changing the input and output shapes and it looks like it worked. Idk what i did wrong the first time…
Thanks for explanations @noio.

sorinpanduru · August 25, 2017, 1:42pm

Hello again…
Seems i’ve been mistaken… this ain’t working with images with different resolutions/aspect ratio. I’ve been trying to understand how to fix this for the past 8 hours and it kinda beat me to it

Here’s what i did:

Built new data set with images of resolution 600x400 (3/2 aspect ratio)
Modified vgg16.py -> create() method to have input and output shapes: input_shape=(3, 400, 600), output_shape=(3, 400, 600)
Modified vgg16.py -> get_batches() to call flow_from_directory with target_size = (400, 600)
Tried finetuning the vgg16 model with my new dataset.
Error i have is here: https://www.dropbox.com/s/oeewuz01niqeyjf/Screenshot%202017-08-25%2016.36.26.png

I have tried:

changing the MaxPooling2D layer to have pool_size (3, 2) ( figured i’d end up with the same aspect ratio as per square images if I do this here). This seemed to change the error a bit - the shapes of the matrices that don’t match… but i haven’t been able to get to what i need. I also don’t think this is the proper solution.
changing the final Dense/Dropout layers declarations…

Help/explanations much appreciated!

marcemile · August 26, 2017, 9:36am

It looks like the dense layer do not match the last fully convolutional layer. Did you replace the dense layers from vgg with new layers of the correct size?

sorinpanduru · August 28, 2017, 7:27am

Hi,
It looks like the Dense layers are constructed without an input_dim param:

model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))

Dense layer constructor:

self, output_dim, init='glorot_uniform',
             activation=None, weights=None,
             W_regularizer=None, b_regularizer=None, activity_regularizer=None,
             W_constraint=None, b_constraint=None,
             bias=True, input_dim=None, **kwargs):

I’ll try setting the input_dim explicitly, but isn’t it supposed to automatically resize the the input size ?

marcemile · August 28, 2017, 8:22am

The input dimension is inferred from the previous layer. If you decide to use pretrained weights the input dimension will be a transformation over the original image height and width. You have to replace the dense layers with new ones.

sorinpanduru · August 28, 2017, 8:24am

Ok, I am now removing the last 5 layers (all Dense/Dropouts) when finetuning and then adding them back.
If i understand correctly, this will make only the convolution layers use the imagenet trained weights, while retraining all the final Dense layers again, with fresh weights.
Thanks for info @marcemile