Lesson Learned: Stagnant Loss Doesn't Mean Incorrect Processing

I am trying to build a VGG16 model using Keras’ built-in applications and I was doing the following:

vgg16 = VGG16(weights='imagenet', include_top=True, input_shape=(224,224,3))
vgg16.layers.pop()
vgg16.layers.pop()
vgg16.layers.pop()
for layer in vgg16.layers: layer.trainable=False
m = Dense(4096)(vgg16.layers[-1].output)
m = Dense(4096)(m)
m = Dense(25,activation='softmax')(m)
vgg16 = Model(vgg16.input, m)
vgg16.compile(optimizer="adam",loss="categorical_crossentropy")

Then I was doing a fit and getting horrible results and they kept leveling out. So my (wrong) assumption was that I had something wrong in my workflow. Well after beating my head against this issue for 3 days, I finally figured out the problem:

I don’t have enough data to train that many weights. To fix my issue, all I had to do was stop trying to retrain the two 4096 dense layers and retrain the final softmax layer. When I did this, I was able to get a much better result and I didn’t bottom out. I don’t have any questions on this one, I just wanted to share this gotcha so if anyone else is 3 days behind me, they can save themselves the headache.

Here is my final model:

vgg16 = VGG16(weights='imagenet', include_top=True, input_shape=(224,224,3))
vgg16.layers.pop()
for layer in vgg16.layers: layer.trainable=False
m = Dense(25,activation='softmax')(vgg16.layers[-1].output)
vgg16 = Model(vgg16.input, m)
vgg16.compile(optimizer="adam",loss="categorical_crossentropy")

My plan now is to get a result for resnet50 and maybe a few other models that Keras has prebuilt and then give each of them a vote before deciding my answer.

Well, scratch this, I am still having issues, but I think I’m closer than I was. I changed my model to the following:

vgg16 = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3))
for layer in vgg16.layers: layer.trainable=False
m = Flatten()(vgg16.layers[-1].output)
m = Dense(25,activation='softmax')(m)
vgg16 = Model(vgg16.input, m)
vgg16.compile(optimizer="adam",loss="categorical_crossentropy")

but I am getting the same prediction over and over. I am going to continue looking into this tomorrow, but if anybody has any insight if they have had the same type of thing happen, I would love to hear about it and hopefully it will spark some ideas.

Just so you know, in your original code you did this:

m = Dense(4096)(vgg16.layers[-1].output)
m = Dense(4096)(m)

and then this:

m = Dense(25,activation='softmax')(vgg16.layers[-1].output)

which completely ignores those two Dense layers you’ve just created. So you were adding a softmax directly to the pooling layer (or is it convolutional?) that precedes the “fc” layers from VGG.

2 Likes

Yep, good call, I fixed that, thanks for the catch!