Comparison of Keras's built in VGG16, Resnet50 & Inception V3 on Cats vs Dogs & suspicions about preprocessing

After building my first few models of cats vs dogs for the kaggle competion I got curious about how well some of the other imagenet solutions perform as starting points for transfer learning. I was also curious about other architectures for image processing. I looked at VGG16, Resnet50 & Inception V3, and also compared @jeremy’s Vgg16 wrapper to the built in Keras function.

My initial starting point was Jeremy’s Vgg16 with some data augmentation and a learning rate of 0.001. After some experimentation I found my best model on 2 epochs I was able to achieve an accuracy of 99.1% on both training and validation which is a great starting point. My hope was that the other architectures would give similar performance and I’d be able to ensemble them but that wasn’t the case.

Next up was Inception V3, which after more experimentation maxed out for me at around 96.5% after 2 epochs at 0.001 and 6 epochs at 0.0001. I was a little disappointed with the performance and I’m still a little suspicious that there might be some undocumented image preprocessing that the keras implementation isn’t doing (more on that suspicion later).

Resnet50 performed a little better achieving 98.6% validation and training accuracy after 3 epochs at 0.001 and 6 epochs at 0.0001.

Finally the VGG16 Keras implementation after 2 epochs had a 97% validation and training accuracy, which is much lower than the implementation by @jeremy. I’m almost certain now that what’s missing is the proper preprocessing layer but I’m struggling to insert that layer into the existing models. I’ve tried creating a lambda layer and then adding the vgg16 model to that but it doesn’t seem to properly connect to the model.

Any thoughts? I would like to be able to use these other models natively, rather than have to build them from scratch, and if the native implementations really don’t have the correct preprocessing that should probably be corrected in the library.

Here’s the code for the lambda layer preprocessing that i’m struggling with:

preprocess = Sequential()
preprocess.add(Lambda(vgg_preprocess, input_shape=(3,224,224)))
preprocess.add(vgg) #vggmodel from keras

x = vgg.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation=‘relu’)(x)
predictions = Dense(2, activation=‘softmax’)(x)

model = Model(input=preprocess.input, output=predictions)

Gives me the error that there are multiple outputs. And if I move it after the assignment of predictions I get an error saying that the graph is disconnected and it can’t obtain a value for the input to vgg. There’s probably an easy way to do this or something obvious I’m missing, but i’ve been digging around the Keras forums/github for a while to no avail.


I just looked into the source of the keras VGG16 implementation and there’s a preprocess function that does exactly what the lambda function does.

My only thought now is that the preprocess function isn’t being called and I can’t figure out how I should call it using batches and the imageDataGenerator.

The ImageDataGenerator init function has a preprocessing_function parameter that will probably work (haven’t looked at that particular lambda function so not sure).

preprocessing_function: function that will be implied on each input.
The function will run before any other modification on it.
The function should take one argument:
one image (Numpy tensor with rank 3),
and should output a Numpy tensor with the same shape.

1 Like

So there is! Where did you read about that? I google away and searched extensively on the Keras github and there are no references to it. It’s not even listed as an input parameter on the ImageDataGenerator page.

Strangely the built in preprocess functions don’t meet those requirements, but I should hopefully be able to figure it out from here.


I had to tweak the ImageDataGenerator class a lot for a project I was working on… Didn’t actually use that parameter but remembered seeing it when I was messing with the source code.

1 Like

Thanks for sharing that. I’m not sure I would have found it otherwise.

The built in preprocess functions don’t work but I was able to repurpose Jeremy’s function and that seems to work just fine.

I’m still not getting the best of results with Resnet50 but it could just be due to the complexity of the architecture which is requiring more training.

have you looked at vgg16 with batch normalization (vgg16bn)? it trains so much faster and better in my experience. if you are training from scratch, i would also look into weight initialization improvement (he_uniform vs. glorot_uniform).

I took a look at vgg16bn last night and wasn’t able to get the same performance out of it that i’m getting out of vgg16. I’m getting accuracy levels better than those Jeremy cited in class just by using a modified version of Vgg16 that contains no dropout (99.1%) in just 2 epochs on both training and test. The vgg16bn model is underfitting, even with no dropout, and i’m not sure how to improve upon the results (96.5% on train, 98.7% on validation). I’ve tried training longer and it converges at around 98.5%.

It did raise some questions for me that i’m going to post to the forum shortly since I think they’d be of interest to everyone.