Difference b/w Jeremy's VGG16 and Keras built in VGG 16

I was exploring Keras a bit further, and found that VGG16 is also available to be loaded directly from the Keras api:

https://keras.io/applications/#vgg16

Upon inspecting the pre-trained network provided in platform.ai/models vs. the one available in keras, the two networks appear to have different architectures. For example, VGG16 provided by Jeremy has dropout layers in between the dense layers, whereas the VGG16 network in keras.applications does not have dropout layers.

Is there any reason for this difference? I find that Jeremy’s network works better for me - did he add the dropout layers himself? For those students who also want to view what I’m talking about - you can see the difference in architectures yourself by doing this:

To view Jeremy’s vgg16 architecture:

from vgg16 import Vgg16
vgg = Vgg16()
vgg.model.summary()

To view the keras vgg16 architecture:

from keras.applications.vgg16 import VGG16
keras_vgg16 = VGG16()
keras_vgg16.summary()

Another question is, where did Jeremy’s VGG16 pre-trained network come from? Did he pre-train himself or did he grab it from a resource somewhere? Is there a repo of pre-trained networks that someone recommends? Thanks

3 Likes

I grabbed the weights from https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3 and made some minor changes. The original VGG network used dropout - I’m not sure why keras removed it. In the course I show how to change the amount of dropout (or remove it) as needed for each application.

2 Likes

@hamelsmu Were you able to get the build in Keras VGG16 to train? I have been trying to use the built-in VGG16 but the training is abysmal …

@achiang s I was able to get it to train and it performed the same for me. Do you want to post your code here?

import numpy as np
from keras.preprocessing import image

vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((3,1,1))
# I tried using this and not using this in the image loading, makes no difference
def vgg_preprocess(x):
    x = x - vgg_mean
    return x
#    return x[:, ::-1] # reverse axis rgb->bgr

import vgg16 as jeremy_vgg16
BATCH_SIZE = 64
DATA_PATH = "data/dogscats/sample/"
batches = jeremy_vgg16.Vgg16().get_batches(DATA_PATH+'train',
                                 gen=image.ImageDataGenerator(preprocessing_function=vgg_preprocess),
                                 batch_size=BATCH_SIZE)
val_batches = jeremy_vgg16.Vgg16().get_batches(DATA_PATH+'valid',
                                     gen=image.ImageDataGenerator(preprocessing_function=vgg_preprocess),
                                     batch_size=BATCH_SIZE)

from keras.applications import vgg16 as keras_vgg16
from keras.models import Model
from keras.layers import Dense, Flatten, Input
from keras.optimizers import SGD, RMSprop, Adam

input_layer = Input(shape=(3, 224, 224),
              name='image_input')
base_model = keras_vgg16.VGG16(weights='imagenet', include_top=False)
x = base_model(input_layer)
x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
predictions = Dense(2, activation='softmax', name='predictions')(x)

# this is the model we will train
keras_vgg = Model(input=input_layer, output=predictions)

# freeze all convolutional Vgg16 layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model
keras_vgg.compile(optimizer=RMSprop(), loss='categorical_crossentropy', metrics=['accuracy'])

keras_vgg.fit_generator(batches,
                        validation_data=val_batches,
                        samples_per_epoch=batches.nb_sample,
                        nb_val_samples=val_batches.nb_sample,
                        nb_epoch=10)

And the output:

Found 40 images belonging to 2 classes.
Epoch 1/10
160/160 [==============================] - 2s - loss: 6.4060 - acc: 0.5187 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 2/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 3/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 4/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 5/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 6/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 7/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 8/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 9/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 10/10
160/160 [==============================] - 2s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000

@achiang The problem is with two sequentially connected ReLU fc1 and fc2 layers. If you change one of them to other activation, e.g. softmax. Your code will work properly.

1 Like

@beniamin
Thanks a bunch! It worked when I changed the first layer fc1 to softmax. (Curiously, not the other way around.) Do you know of a specific reason why two relu layers in a row don’t work?

1 Like

I am curious about this as well… As @achiang mentioned, the fitting progress improves drastically when the activation of the first dense layer is changed to softmax and also when I used just one softmax layer instead of the two ReLU (results below).

Also, it seems @jeremy 's code from lesson 2 (relevant section from vgg16.py copied below) has two fully connected ReLU layers, albeit with 0.5 dropout after each, and seems to work fine.

@beniamin is there something I am missing that allowed you to identify the two dense ReLU layers back-to-back as problematic? Perhaps something to do with the magnitude of inputs from the convolutional layers being put into an unbounded function before being regularized/normalized by dropout/softmax?

Thanks alot!

Results with just 1 softmax layer between convolution layers/flattening and the predictions, lr=.05:

Epoch 1/10
160/160 [==============================] - 5s - loss: 0.6901 - acc: 0.6000 - val_loss: 0.6885 - val_acc: 0.5750
Epoch 2/10
160/160 [==============================] - 4s - loss: 0.6819 - acc: 0.6813 - val_loss: 0.6674 - val_acc: 0.8750
Epoch 3/10
160/160 [==============================] - 4s - loss: 0.6696 - acc: 0.8063 - val_loss: 0.6642 - val_acc: 0.8750
Epoch 4/10
160/160 [==============================] - 4s - loss: 0.6568 - acc: 0.8937 - val_loss: 0.6604 - val_acc: 0.8250
Epoch 5/10
160/160 [==============================] - 4s - loss: 0.6526 - acc: 0.9000 - val_loss: 0.6431 - val_acc: 0.9250
Epoch 6/10
160/160 [==============================] - 4s - loss: 0.6402 - acc: 0.9625 - val_loss: 0.6382 - val_acc: 0.9250
Epoch 7/10
160/160 [==============================] - 4s - loss: 0.6355 - acc: 0.9688 - val_loss: 0.6339 - val_acc: 0.9250
Epoch 8/10
160/160 [==============================] - 4s - loss: 0.6313 - acc: 0.9688 - val_loss: 0.6298 - val_acc: 0.9250
Epoch 9/10
160/160 [==============================] - 4s - loss: 0.6272 - acc: 0.9688 - val_loss: 0.6259 - val_acc: 0.9250
Epoch 10/10
160/160 [==============================] - 4s - loss: 0.6233 - acc: 0.9688 - val_loss: 0.6221 - val_acc: 0.9250

Relevant code from vgg16, lesson 2:

def FCBlock(self):
    model = self.model
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))


def create(self):
    model = self.model = Sequential()
    model.add(Lambda(vgg_preprocess, input_shape=(3,224,224)))

    self.ConvBlock(2, 64)
    self.ConvBlock(2, 128)
    self.ConvBlock(3, 256)
    self.ConvBlock(3, 512)
    self.ConvBlock(3, 512)

    model.add(Flatten())
    self.FCBlock()
    self.FCBlock()
    model.add(Dense(1000, activation='softmax'))

    fname = 'vgg16.h5'
    model.load_weights(get_file(fname, self.FILE_PATH+fname, cache_subdir='models'))
2 Likes

@achiang, I see a couple of odd details in your output:

  • 40 samples is definitely not enough to train a CNN in most cases

  • Your results stay the same after the 1st epoch. This makes me assume that you have 20 samples per class and your model is simply guessing a single class for each epoch. If this is the case, it makes sense that it has an accuracy of 0.5000

What I reccomend:

  • Get more data/samples

  • look at your model’s predictions to confirm the second detail

It’s recommended that you use softmax strictly in the output layer as it is only supposed to return an “array” of probabilities per class. My intuition tells me that the model would simply adjust any hidden softmax layers to have no effect, i.e. for 2 classes, output a likelihood of 0.5 for each one.

How many samples are you training on?

Could you print both the results and the model’s architecture/layers and share this with us?

Hmm? I think @achiang and I shared you’re intuition initially but found that ReLU hidden layers were incapable of fitting anything in this, albeit small, example but softmax hidden layers took fit right away. It seems it was obvious to @beniamin that softmax was needed and two ReLU layers would fail as a hidden layers but I don’t know why.

We are both using the same small # of samples and architecture/layers per the code @achiang posted, with the hidden layers modified per the discussion.

The sample set is small but it has both classes. I think the small sample set doesn’t matter since the CNN layers are frozen.

Actually I tried it with the full-sized train/ validation dogscats data provided in Week1 but there was still no progress in training. And as @vinvinvin pointed out w/ the softmax hidden layer it works with the sample dataset with no problems.

I do have same output.

Train shape: (2000, 3, 150, 150)
Validation shape: (800, 3, 150, 150)
Test shape: (100, 3, 150, 150)
Model loaded.
Train on 2000 samples, validate on 800 samples
Epoch 1/50
2000/2000 [==============================] - 42s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 2/50
2000/2000 [==============================] - 39s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 3/50
2000/2000 [==============================] - 40s - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000

I am taking bottleneck features of with help of VGGNET. then with these bottleneck features, training a small network.

model = Sequential()
model.add(Flatten(input_shape=train_features.shape[1:]))
model.add(Dense(1024, activation=‘softmax’, name=‘fc1’))
model.add(Dense(1, activation=‘softmax’, name=‘predictions’))

model.compile(optimizer=‘rmsprop’, loss=‘binary_crossentropy’, metrics=[‘accuracy’])

model.fit(train_features, train_labels,
nb_epoch=nb_epoch, batch_size=32, shuffle=True, verbose=1,
validation_data=(validation_features, validation_labels))

Please tell me, what can be done in this case. And what is this value loss: 8.0590 - acc: 0.5000 which I see at lot of places.

Thanks to this amazing course and the Deep Learning with Python e-book from Francois Chollet himself, spotted and twitted by our own @EricPB a couple of days ago, I am almost there with building my own nets in Keras. My problem atm is becoming able to code the pre-cooked architectures effectively with different datasets (eg. Kaggle Planet competition) and different parametric optimisators, for example hyperas. As for VGG16 and Jeremy’s Lesson 2 Part 1, I am still trying to understand how I can do with my 16GB RAM memory, my local Python kernel just shuts down halfway atm.

When I try to build a new model based on vgg16 on my mac, I got the same problem too. Whatever I change my model , it seems got 0.5 accuracy for ever .
At last , I find some thing, and fix it, here is some tips of my solution.( I use vgg16 model buildin keras 2 not same as Jeremy used )

  1. Do not use augmentation on valid set.
    We should always be careful of this when using ImageDataGenerator both on training set and validation set .
    When I do assignment of lesson 2 on my mac , I just use 40 example for each training and validation set, it’s pretty small for such a large model .
    And I do augmentation both on training and validation set , this make my accuracy even less than 0.5.

  2. Using dropout and try different architectures.
    I tried many different architectures when I doing this.
    The architecture below based on a bottleneck output with keras vgg16 model (without last 3 FC layers).
    It maybe not the best solution, but good enough for the 40 training examples. Also, it’s much smaller than the vgg16 FC layers.

model = Sequential()
model.add(Dropout(0.5, input_shape=train_x.shape[1:]))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

model.compile(optimizer=Adam(lr=0.0001),
              loss='categorical_crossentropy',
             metrics=["accuracy"])
model.summary()
  1. Try different learning rate, and start with a small value.
    I started training with lr=0.0001, not the default 0.01, and try many different values.
    This is just a experiment on small data set, but enough before we training the full data set on the GPU environment. At least , we found a way seemed probably right for training a new model to solve the classification problem .
hist = model.fit(x=train_x, y=train_y,
         batch_size=8,
         epochs=10,
         callbacks=cb,
         validation_data=(valid_x, valid_y),
         shuffle=True)

...
Epoch 8/10
80/80 [==============================] - 1s - loss: 0.5086 - acc: 0.7250 - val_loss: 0.4986 - val_acc: 0.7750
Epoch 9/10
80/80 [==============================] - 1s - loss: 0.4637 - acc: 0.7875 - val_loss: 0.4811 - val_acc: 0.7750
Epoch 10/10
80/80 [==============================] - 1s - loss: 0.4231 - acc: 0.7875 - val_loss: 0.5259 - val_acc: 0.7000

model.compile(optimizer=Adam(lr=0.00001),
              loss='categorical_crossentropy',
             metrics=["accuracy"])
hist2 = model.fit(x=train_x, y=train_y,
         batch_size=8,
         epochs=5,
         callbacks=cb,
         validation_data=(valid_x, valid_y),
         shuffle=True)

Train on 80 samples, validate on 40 samples
Epoch 1/5
80/80 [==============================] - 2s - loss: 0.3601 - acc: 0.8750 - val_loss: 0.5078 - val_acc: 0.7000
Epoch 2/5
80/80 [==============================] - 1s - loss: 0.2431 - acc: 0.9125 - val_loss: 0.4992 - val_acc: 0.7000
Epoch 3/5
80/80 [==============================] - 1s - loss: 0.3448 - acc: 0.8250 - val_loss: 0.4837 - val_acc: 0.7500
Epoch 4/5
80/80 [==============================] - 1s - loss: 0.5362 - acc: 0.7250 - val_loss: 0.4959 - val_acc: 0.7000
Epoch 5/5
80/80 [==============================] - 1s - loss: 0.3767 - acc: 0.8125 - val_loss: 0.4927 - val_acc: 0.7000

Hope this will help someone , thanks .:stuck_out_tongue_closed_eyes:

The code below worked for me using Keras 2. Be sure to set include_top=True in the VGG Keras download.

Instead of popping off the original 1,000 category prediction layer from VGG, just connect your new prediction layer to the last fc layer in VGG:

# retrieve the full Keras VGG model including imagenet weights
vgg = VGG16(include_top=True, weights='imagenet',
            input_tensor=None, input_shape=(224,224,3), pooling=None)

# set to non-trainable
for layer in vgg.layers: layer.trainable=False

# define a new output layer to connect with the last fc layer in vgg
# thanks to joelthchao https://github.com/fchollet/keras/issues/2371
x = vgg.layers[-2].output
output_layer = Dense(2, activation='softmax', name='predictions')(x)

# combine the original VGG model with the new output layer
vgg2 = Model(inputs=vgg.input, outputs=output_layer)

# compile the new model
vgg2.compile(optimizer=Adam(lr=0.001),
                      loss='categorical_crossentropy', metrics=['accuracy'])

# run it!
vgg2.fit_generator(batches,
                   steps_per_epoch = batches.samples // batch_size,
                   validation_data = val_batches, 
                   validation_steps = val_batches.samples // batch_size,
                   epochs = 1)

Epoch 1/1
359/359 [==============================] - 82s - loss: 0.1343 - acc: 0.9581 - val_loss: 0.0985 - val_acc: 0.9713

1 Like