VGG16 Change Size from (224,224) or Change Images size (256,256)

Is it better to use the default size of (224,224) and change my images to that size (I’m trying to modify the dogs and cats to compete here: https://www.hackerearth.com/challenge/competitive/deep-learning-challenge-1/) or modify vgg16 to use images of size 256x256. What kind of consequences does that have on the model? I have tried using the vgg16 model from Keras and I am not getting great results, but I believe part of the reason is that they aren’t letting me use the imagenet images to give the model a good starting point. This is making me lean towards modifying my pictures to all be 224,224 by cutting off that many pixels on each photo, just wanted to see if that is the conclusion the community would have come to as well. Not sure if this is useful, but this is what I’m using:

import keras
from keras.applications import VGG16
vgg = VGG16(weights=None,classes=25,input_shape=(256,256,3))
vgg.compile(optimizer="adam",loss='categorical_crossentropy')

Having slightly larger source images is useful for doing basic data augmentation. For example, you can train your CNN by taking several random 224x224 crops from the same 256x256 image.

You can also use this trick for making predictions: instead of resizing the source image to 224x224, make predictions on the four corner crops and the center crop, and average those.

However, VGG should work OK with 256x256 inputs, but you may have to specify pooling='avg' before your classification layer (that way VGG can handle inputs of any size).

3 Likes

It makes sense that this would work with 256x256 vs the original 224x224, but how do you think this would work with much larger images like Statefarm? I think it is 600x480 so you would be dealing with a much smaller fraction of the picture with each of the samplings

I would like to get some more mileage out of my data augmentation, which currently is very conservative because the Statefarm images are so similar.

If your model uses a global average pooling before the “dense” classification layer, then it does not matter how large the inputs are. You can input a 600x480 image and the model will give a prediction for the full image.

However, if you wanted to take 224x224 crops from the 600x480 image, you could first resize it so the smallest side is 256. That would make the input image 320x256. Now you can take 224x224 crops from this resized image.

Would that prediction likely be better or are you saying it is just possible? Also - could you what resizing to 256 might accomplish?

Right now I’m at the point where I understand these things are possible, but don’t know if there is some guideline for when resizing is useful.

It depends on what you’re using the neural network for.

Suppose you want to produce an output that has the same size as the input image (segmentation, style transfer, etc). In that case, you want to use the original image as the input to your network, not a resized version.

But for something like classification or object detection, it’s OK to use a resized version. You could use the original image size but that is much slower and doesn’t necessarily give better results.

If your input image is not square, then resizing it to 224x224 will make it look squashed, i.e. it distorts the contents of the image. Therefore, people like to resize the image while keeping the aspect ratio intact, and then take a square crop from the image.

And as I said, using a slightly larger image to make several smaller crops is a simple way to do data augmentation.

1 Like

Ok thanks!

So Statefarm, for example, might be better off getting scaled down instead of squashed into a square.