Adding custom top part to resnet50 model


(Adrian Sahlman) #1

Hey!

Im trying to finetune the resnet50 model in keras. I have calculated the convolution features from the resnet50 model (using include_top=False) so that fitting my model becomes very fast. Now im at a stage where I want to merge the resnet50 model and my model and this is where I start having problems.

Code I use to merge the models:

top_model = Sequential()
top_model.add(Flatten(input_shape=(2048,1,1)))
top_model.add(Dense(2,activation='softmax'))
top_model.load_weights(model_top_path_fname)

resn_model = resnet50.ResNet50(include_top=False)

full_model = Model(input=resn_model.input,output=top_model(resn_model.output))

Im getting the error message:

ValueError: The shape of the input to "Flatten" is not fully defined (got (2048, None, None). Make sure to pass a complete "input_shape" or "batch_input_shape" argument to the first layer in your model.

I have tried passing a batch_input_shape argument to the Flatten() layer as well but with no success. What am I doing wrong?

Also, are the weights going to be copied to my new model from the two merged models or do I have to copy these “by hand”?

Thanks in advance


(Adrian Sahlman) #2

The only way I was able to solve this problem was by getting resnet50 with the Flatten() layer included and making my model from that.

top_model = Sequential()
top_model.add(Dense(2,activation='softmax',input_shape=(2048,)))
top_model.load_weights(model_top_path_fname)

resn_model = resnet50.ResNet50(include_top=True)
resn_model = Model(resn_model.input,resn_model.layers[-2].output)

full_model = Model(input=resn_model.input,output=top_model(resn_model.output))

Is this workaround required? Or is there a solution to my initial problem?


(Adrian Sahlman) #3

Im having alot of problems trying to finetune the resnet50 model. Here is my code:

resn_model = resnet50.ResNet50(include_top=True)
resn_model = Model(resn_model.input,resn_model.layers[-2].output)

batch_size = 64
target_size=(224,224)
train_batches = gen.flow_from_directory(train_path,batch_size=batch_size,class_mode='categorical',target_size=target_size,shuffle=False)
valid_batches = gen.flow_from_directory(valid_path,batch_size=batch_size,class_mode='categorical',target_size=target_size,shuffle=False)

train_labels = onehot_encode(train_batches.classes)
valid_labels = onehot_encode(valid_batches.classes)

train_features = resn_model.predict_generator(train_batches,train_batches.nb_sample)
valid_features = resn_model.predict_generator(valid_batches,valid_batches.nb_sample)

top_model = Sequential()
top_model.add(Dense(2,activation='softmax',input_shape=(2048,)))
top_model.compile(optimizer=Adam(),loss='categorical_crossentropy',metrics=['accuracy'])

top_model.optimizer.lr = 0.0001
nb_epoch = 10
top_model.fit(train_features,train_labels,batch_size=batch_size,validation_data=(valid_features,valid_labels),nb_epoch=nb_epoch)

full_model = Model(input=resn_model.input,output=top_model(resn_model.output))
full_model.compile(optimizer=Adam(),loss='categorical_crossentropy',metrics=['accuracy'])

full_model.evaluate_generator(valid_batches,val_samples=valid_batches.nb_sample)
#Outputs: [0.14526151073591842, 0.94553376906318087]

for layer in full_model.layers:
    layer.trainable=False
full_model.layers[-1].trainable = True
full_model.summary()
#Outputs: Total params: 23,591,810
#         Trainable params: 4,098
#         Non-trainable params: 23,587,712
#(Only the last layer is trainable)

nb_epoch = 1
full_model.fit_generator(train_batches,train_batches.nb_sample,nb_epoch=nb_epoch,
                         validation_data=valid_batches,nb_val_samples=valid_batches.nb_sample)

full_model.evaluate_generator(valid_batches,val_samples=valid_batches.nb_sample)
#Outputs: [2.9434102562373843, 0.63180827886710245]

Am I not keeping the weights from resn_model and top_model when creating full_model? Whats going on here? Setting the learning rate to 0 before doing full_model.fit_generator() does the same thing. My accuracy is closer to 0,5.

I also tried loading the weights from resn_model and top_model into full_model:

resn_layers = resn_model.layers
top_layers = top_model.layers
full_layers = full_model.layers

for l1,l2 in zip(full_model.layers[:len(resn_layers)],resn_layers):
    l1.set_weights(l2.get_weights())

for l1,l2 in zip(full_model.layers[len(resn_layers):],top_layers):
    l1.set_weights(l2.get_weights())

This changes nothing. I am really lost here and have been googling this issue for hours without finding a solution. Can anyone here please explain what I am doing wrong?

EDIT:
When training top_model I use a batch_size of 4

batch_size = 4
target_size=(224,224)
gen = ImageDataGenerator()
train_batches = gen.flow_from_directory(train_path,batch_size=batch_size,class_mode='categorical',target_size=target_size,shuffle=True)
valid_batches = gen.flow_from_directory(valid_path,batch_size=batch_size,class_mode='categorical',target_size=target_size,shuffle=False)

If I train on only batch_size amount of images per epoch I get:

nb_epoch = 3
full_model.fit_generator(train_batches,batch_size,nb_epoch=nb_epoch,
                     validation_data=valid_batches,nb_val_samples=valid_batches.nb_sample)
#Output:
#Epoch 1/3
#4/4 [==============================] - 11s - loss: 1.1729 - acc: 0.5000 - val_loss: 0.1455 - val_acc: 0.9521
#Epoch 2/3
#4/4 [==============================] - 11s - loss: 8.5129 - acc: 0.2500 - val_loss: 0.1482 - val_acc: 0.9477
#Epoch 3/3
#4/4 [==============================] - 11s - loss: 4.1496 - acc: 0.5000 - val_loss: 0.1514 - val_acc: 0.9477

And if I train on the full training set:

nb_epoch = 3
full_model.fit_generator(train_batches,train_batches.nb_sample,nb_epoch=nb_epoch,
                     validation_data=valid_batches,nb_val_samples=valid_batches.nb_sample)
#Output:
#Epoch 1/3
#1836/1836 [==============================] - 96s - loss: 3.0078 - acc: 0.4929 - val_loss: 2.9002 - val_acc: 0.5381
#Epoch 2/3
#1836/1836 [==============================] - 95s - loss: 3.0078 - acc: 0.4929 - val_loss: 3.0234 - val_acc: 0.5272
#Epoch 3/3
#1836/1836 [==============================] - 95s - loss: 3.0078 - acc: 0.4929 - val_loss: 3.0246 - val_acc: 0.5272

After training on the full training set the weights are all wrong. I use a very small learning rate as well: full_model.optimizer.lr = 0.0000000001


(Adrian Sahlman) #4

Heres a guy having the same problem:

No one answered his post though so Im still unable to solve this problem. Anyone that can help?


(Adrian Sahlman) #5

I found a solution. The problem arises when I create my full model out of the resnet50 model and my layers. Solution:

top_model = Sequential()
top_model.add(Dense(2,activation='softmax',input_shape=(2048,)))
top_model.load_weights(model_top_path_fname)


resn_model = resnet50.ResNet50(include_top=True)
resn_model.layers.pop()

for layer in resn_model.layers:
    layer.trainable = False

last = resn_model.layers[-1].output

x = Dense(2,activation='softmax',weights=top_model.layers[-1].get_weights())(last)

full_model = Model(resn_model.input,x)

If I want to finetune the convolution layers this piece of code does not work:

for layer in full_model.layers:
    layer.trainable = True

I have to set the layers in the resnet50 model to trainable when I create my full model if I want to tune convolution layers as well:

for layer in resn_model.layers:
    layer.trainable = True

last = resn_model.layers[-1].output

x = Dense(2,activation='softmax',weights=top_model.layers[-1].get_weights())(last)

full_model = Model(resn_model.input,x)

Can someone clear this up to me? Is this a python specific problem where I have misunderstood how python works or is it am I using keras incorrectly?


(Flo) #6

When using include_top=False you have to specify an input_shape!
For example:
resn_model = ResNet50(weights=‘imagenet’, include_top=False, input_shape=(299, 299, 3))
In your case you maybe have to flip around the channel if above doesn’t work:
resn_model = ResNet50(weights=‘imagenet’, include_top=False, input_shape=(3, 299, 299))

Therefore the error message which says that the input for flatten is not fully defined.
Another thing:

I have calculated the convolution features from the resnet50 […] Now im at a stage where I want to merge the resnet50 model and my model

Why do you want to merge them? You already have the pre computed features from the ResNet model. You now can just run them through your top_model!


(Adrian Sahlman) #7

When using include_top=False you have to specify an input_shape!
For example:
resn_model = ResNet50(weights=‘imagenet’, include_top=False, input_shape=(299, 299, 3))
In your case you maybe have to flip around the channel if above doesn’t work:
resn_model = ResNet50(weights=‘imagenet’, include_top=False, input_shape=(3, 299, 299))

I did try that but still got the error! But I solved that issue by merging them in another way.
In the source code for the resnet50 model in keras it seems like not passing an input shape does not matter unless you want an unconventional (not 224x224) input shape:

    input_shape = _obtain_input_shape(input_shape,
                                  default_size=224,
                                  min_size=197,
                                  data_format=K.image_data_format(),
                                  include_top=include_top)

    if input_tensor is None:
        img_input = Input(shape=input_shape)
    else:
        if not K.is_keras_tensor(input_tensor):
            img_input = Input(tensor=input_tensor, shape=input_shape)
        else:
            img_input = input_tensor

Why do you want to merge them? You already have the pre computed features from the ResNet model. You now can just run them through your top_model!

I did that at first to train the weights for my top_model. Then I wanted to merge them as Im using data augmentation and can therefor not precalculate the convolutional features. I also want to try to finetune the convolutional layers, even if that might not help my model.