Why reshape(3,224,224) in vgg16.py?

Trying to walk through and understand the code why we start with an input_shape of (3,224,224).

I’m assuming that:

3 = the channels (BGR)
224 = image width
224 = image height

and also …

what is the “x” in def vgg_preprocess(x) refer too?

I see the function is called frommodel.add(Lambda(vgg_preprocess, input_shape=(3,224,224))) … and so I’m assuming it is called for each image? If so, what images is it acting on since at this point in the model creation, no images have been loaded via the call to vgg.get_batches().

I’m sure I’m missing something here so sorry for the confusion.

you assume right, the Imagegenerator creates rgb images of size 3224224, the “x” is a parameter input to the vgg_preprocess python function (https://www.tutorialspoint.com/python/python_functions.htm), basically vgg_preprocess receives an image “x” and it does two processing steps: it substracts the average r,g and b values as it was done by the people who created imagenet vgg16 model and in a second step it changes the order of the channels from rgb to bgr (I believe because the software used by the Imagenet competitors uses OpenCV to load images which uses BGR as the default channel order).
The batches processing of images is a bit difficult to get grips on at the beginning, but it is quite simple, basically everything is done in the background by the Image preprocessing: https://keras.io/preprocessing/image/ which is in charge of “feeding” the model fit with images from the training directory.

2 Likes

cool. thanks again for the good info!

Question…

  1. Looking at x[:, ::-1] part of vgg_preprocess() function. Shouldn’t it be x[::-1, :], since the shape of vgg_mean is (3,1,1), not (1,1,3)? And the images we input using get_batches() also have that shape.

  2. Shouldn’t the comment say rgb->bgr, instead of bgr->rgb?

For reference from lesson1.ipynb:

# Mean of each channel as provided by VGG researchers
vgg_mean = np.array([123.68, 116.779, 103.939]).reshape((3,1,1))

def vgg_preprocess(x):
    x = x - vgg_mean     # subtract mean
    return x[:, ::-1]    # reverse axis bgr->rgb

thanks!

@lev the first dimension of the ImageDataGenerator is the sample number as processed by the generator, from https://keras.io/preprocessing/image/

dim_ordering: One of {“th”, “tf”}. “tf” mode means that the images should have shape (samples, height, width, channels), “th” mode means that the images should have shape (samples, channels, height, width). It defaults to the image_dim_ordering value found in your Keras config

Looking at the shape of vgg_mean can be misleading because it is broadcasted during the substraction operation (https://docs.scipy.org/doc/numpy-1.10.0/user/basics.broadcasting.html)

so the code is doing the correct change on the second dimension. For an example:
x=np.ones((1,3,5,5), dtype=np.int)
print x

vgg_mean = np.array([123.68, 116.779, 103.939]).reshape((3,1,1))

x = x - vgg_mean
print x

x = x[:, ::-1]
print x

regarding 2. I believe you are right and it is just an errata

6 Likes

@Gelu74, nice, thank you!

I missed the first dimension from the image generator, of course only looking at vgg_mean.

You set me on the right track by pointing out the numpy broadcasting behavior. It’s pretty easy to trip over that and slicing/indexing of multi-dimensional arrays. Appreciate that.

regarding 2. I believe you are right and it is just an errata

If that’s the case, I think it should be fixed. When you’re starting out, you tend to hold on to every single bit of information as you try to understand. Even in such a simple case - it can easily make you start second guessing yourself. Even when you already get it. :slight_smile: @jeremy

I’ve tried to build vgg16 from scratch,
v> gg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((3,1,1))

def vgg_preprocess(x):
x = x - vgg_mean # normalize image using mean of all images of each layer
return x[:, ::-1]
model.add(Lambda(vgg_preprocess, input_shape=(3,224,224), output_shape=(3,224,224)))
model.add(ZeroPadding2D(padding=(1,1)))
Why the size of (5, 226, 224) instead of (3, 226, 224)?
zeropadding2d_4 (ZeroPadding2D) (None, 5, 226, 224) 0 lambda_3[0][0]