How to finetune with new Keras API?

Anyone working with the new Keras API?

Finetuning code doesn’t work and I can’t figure out how to implement with the 2.x branch of Keras.

What’s the error you are getting?
Some methods and parameters have been renamed in the new api

Many errors actually …

The whole finetuning code is broken it seems.

model.pop() throws an exception and it appears that you can no longer just do a model.add(Dense(...)) either to append new layers before compiling the model.

Looks like finetuning is drastically different with the latest version of Keras

I had to use the functional API in order to finetune the model as such. Is there a better way???

model.layers.pop()
for layer in model.layers: layer.trainable = False
    
# recover the output from the last layer in the model and use as input to new Dense layer
last = model.layers[-1].output
x = Dense(train_batches.num_class, activation="softmax")(last)
model = Model(model.input, x)

model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

The pop method has been moved to the layers class

try this instead

model.layers.pop()

correct. so has model.add()

See my code above. Not sure if there is another way or better way to finetune, but at least it works.

If you use keras 2 vgg you can exclude top layers by setting include top=False

 input_tensor = Input(shape=(img_width,img_height,3))
    # build the VGG16 network
    base_model = applications.VGG16(weights='imagenet', include_top=False,input_tensor=input_tensor)
    print('Model loaded.')
    print(base_model.summary())

# finetune
top_model = Sequential()
top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(6, activation='sigmoid'))
top_model.load_weights(top_model_weights_path)

model = Model(input= base_model.input, output= top_model(model.output))
model.add(top_model)

for layer in model.layers[:18]:
    layer.trainable = False

The problem with this is that you’re not fine tuning the FC layers now, but are training them from scratch. If you don’t have plenty of data, this results in a significant accuracy loss (and increase in training time).

@jeremy, Thank you for pointing this out it’s the case for BN version, but weights=‘imagenet’ loads the weights if available locally else loads it from the keras site by default. So, I think it should be fine.

I just checked the keras code, and the pretrained weights aren’t loaded into any FC layers with include_top=False. So you’ll need to make it True and then split the model afterwards, as we’ve done in the lessons.

2 Likes

I believe the approach I outlined above is correct. It uses the Functional API to do the finetuning as such:

model.layers.pop()
for layer in model.layers: layer.trainable = False
    
# recover the output from the last layer in the model and use as input to new Dense layer
last = model.layers[-1].output
x = Dense(train_batches.num_class, activation="softmax")(last)
model = Model(model.input, x)

model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

The Keras VGG implementations are built using the functional API, so I’m assuming this or else a very similar approach is needed to finetune.

2 Likes

Looks right to me. Although it’s likely to be much faster to precompute the penultimate layer and then just train the linear classifier at the end. The approach you’ve shown still has to do the forward pass through the whole network.

Thanks Jeremy, this is a big deal and I ran into exactly this issue wondering why it was taking so long to train!!

How to convert vgg16_bn.h5 to a tf backend model?

Maybe this can help ?

Converting convolution kernels from Theano to TensorFlow and vice versa:

If you want to load pre-trained weights that include convolutions (layers Convolution2D or Convolution1D), be mindful of this:
Theano and TensorFlow implement convolution in different ways (TensorFlow actually implements correlation, much like Caffe), and thus, convolution kernels trained with Theano (resp. TensorFlow) need to be converted before being with TensorFlow (resp. Theano).
Here’s how.
https://github.com/fchollet/keras/wiki/Converting-convolution-kernels-from-Theano-to-TensorFlow-and-vice-versa

Inspired by ideas of Lesson 7, I’m trying to build my own Fully Convolutional Network based on current versions of libraries available (keras 2.0.9 on theano 1.0.1 in particular)
I’m trying to get the ideas of provided notebooks but not the code itself as much as possible.
I figured out that Keras 2 includes “built-in” implementation of VGG16 model, and I try to utilise it. I figured out that in contrast to VGG16 model version we considered during our lessons (as far as I understand it was developed by @jeremy especially for the classes) the “built-in VGG16” is not Sequential model, so it has no model.pop() method.

My first idea was to use model.layers.pop() (as it was already suggested by @mmusket above ) but it is not exact equivalent of model.pop() for Sequential models. After model.layers.pop() the model had produced the same output as before model.layers.pop().
More precise equivalent is something like the code below. In particular it allowed me to exclude last MaxPooling2D layer of “keras built-in VGG16” and precompute the features before this last layer (exactly as @jeremy suggested) . I’m not sure if there is more elegant, compact or idiomatic way to do it with keras. And will be glad if somebody point me on that way if any.

model.layers.pop()
model.layers[-1].outbound_nodes = []
model.outputs = [model.layers[-1].output]

This also works if you want to simply get the output of the penultimate layer:

headless_model = Model(model.input, model.layers[-2].output)