Lesson 3 discussion

devsp · January 8, 2017, 7:04pm

Hi! I’m currently trying out the “Removing Dropout” section but I get weird results.

If I get the new model with get_fc_model() and train it with the trn_features that I calculated before I get only an accuracy near to 0.5 most of the time. Just to clarify, the full model (conv + fc layers) from where the conv layers are extracted and trn_features is calculated has an accuracy near 97%, so I really don’t get why this is happening.

My guess is that by removing the Dropout I kind of messed out with the loaded weights, so the model needs to be trained again. Does anybody have an idea about what’s going on? Thanks in advance!

...
conv_layers, fc_layers = split_conv_fc(vgg.model)
conv_model = Sequential(conv_layers)

train_features = conv_model.predict_generator(train_batches, train_batches.nb_sample)
val_features = conv_model.predict_generator(val_batches, val_batches.nb_sample)
val_classes = val_batches.classes
train_classes = train_batches.classes
val_labels = onehot(val_classes)
train_labels = onehot(train_classes)

model = get_fc_model()
model.fit(train_features, train_labels, nb_epoch=8, 
      batch_size=batch_size, validation_data=(val_features, val_labels))

EDIT: I’ve just discovered that while the accuracy seems to improve a bit after several epochs (from 0.5 to 0.7 - still quite low) the validation accuracy remains stable in 0.5.

devsp · January 8, 2017, 11:18pm

I found the problem. I was setting shuffle=True as a parameter when getting the batches with flow_from_directory. After I set it to False everything worked fine.

rachel · January 9, 2017, 3:58am

@complancoder

For the mean (vgg_mean) that we deduct from each element of the array ,cant we just use 128 since we know it will always be within (0,255) ?

Since we are downloading pretrained weights, we need to normalize our data the same way that the data was normalized when the network was trained (which was using the mean of ImageNet data).

So I see we change from RGB to BGR ? What is the reason for doing so ? I read that OpenCV uses BGR for reading image files , but i don’t see any usage of openCV. Am I missing something ?

The team that originally trained this data (that we are getting the weights from) used BGR, so the weights assume that order.

tmu · January 14, 2017, 2:52pm

I’m working through Lesson 3 by coding steps myself and applying them to Dogs and Cats data.

How do I combine separately trained convolutional model and fully connected model to get predictions?

I separated convolutional layers to their own model (conv_model) and fully connected layers with Dropout 0 to own model (fc_model). Then I fitted the fully connected part separately, just like in the notebook. The model fitted well, and started to overfit.

Now, before proceeding to data augmentation, batch norm etc, I’d like to get predictions on Dogs and Cats test data. How do I combine these two models to a single one that can do the predictions?

tmu · January 14, 2017, 3:37pm

Answering my own question. I was able to combine them by basically copying the convolution layers and then adding similar structure as in retrained fully connected layers and copying the weights from them.

def get_fc_layers():
    input_shape = conv_model.layers[-1].output_shape[1:]
    return [
            MaxPooling2D(input_shape=input_shape),
            Flatten(),
            Dense(4096, activation="relu"),
            Dropout(0.),
            Dense(4096, activation="relu"),
            Dropout(0.),
            Dense(2, activation="softmax")
    ]

vgg2 = vgg16.Vgg16()
last_conv_ix = [ix for ix, layer in enumerate(vgg2.model.layers) if type(layer) == Convolution2D][-1]
conv_layers = vgg2.model.layers[:last_conv_ix+1]

no_dropout_model = Sequential(conv_layers)
no_dropout_model.summary()

fc_layers = get_fc_layers()
for layer in fc_layers: 
    no_dropout_model.add(layer)

for l1,l2 in zip(fc_model.layers, fc_layers):
    l2.set_weights(l1.get_weights())

no_dropout_model.summary()

With this model (retrained fully connected model with dropout 0, no batch norm, no data augmentation) I was able to get below 0.079 score (Top 15%). The model was already overfitting, so it seems that next steps can improve the score even further.

Is it safe to apply layers to multiple models? Or is better to create new set of layers and apply weights.

zaiddabaeen · January 14, 2017, 7:26pm

I am still not able to get this to work:

def get_fc_model():
    model = Sequential([
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dense(4096, activation='relu'),
        Dropout(0.),
        Dense(4096, activation='relu'),
        Dropout(0.),
        Dense(2, activation='softmax')
        ])

    for l1,l2 in zip(model.layers, fc_layers): l1.set_weights(proc_wgts(l2))

    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

fc_model = get_fc_model()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-61-e9cf85a7e40b> in <module>()
----> 1 fc_model = get_fc_model()

<ipython-input-60-169709b3e88d> in get_fc_model()
     10         ])
     11 
---> 12     for l1,l2 in zip(model.layers, fc_layers): l1.set_weights(proc_wgts(l2))
     13 
     14     model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in set_weights(self, weights)
    964                              str(len(params)) +
    965                              ' weights. Provided weights: ' +
--> 966                              str(weights)[:50] + '...')
    967         if not params:
    968             return

ValueError: You called `set_weights(weights)` on layer "dropout_21" with a  weight list of length 4, but the layer was expecting 0 weights. Provided weights: [array([ 20.3107,  10.0247,  10.8138, ...,  12.088...

Note: No idea why it is getting to dropout_21 (and increasing) though I’ve reinitialized the model, and loaded the features again (I just skipped the conv_model.predict_generator and used load_array)

tmu · January 14, 2017, 9:03pm

Zaid, problem is likely in the step where you take fc_layers from the pretrained model. print fc_layers and ensure that they have the same structure as what you construct in get_fc_model:

for layer in fc_layers: 
    print layer.name, layer.input_shape, layer.output_shape

matt.mcclean · January 14, 2017, 9:14pm

Hi, I am working through the Lesson 3 Jupyter workbook and want to run the Batch Normalization example.

It references the weights of the new model that Jeremy trained on ImageNet in the cell with the text: bn_model.load_weights('/data/jhoward/ILSVRC2012_img/bn_do3_1.h5')

I can’t seem to find this file anywhere on the wiki or github.

Is there somewhere I can download the file named bn_do3_1.h5 to test out this new model?

Jonas · January 14, 2017, 9:44pm

I am reading Nielsens neural network deeplearning chapter 4 and I came to this expression
σ∘f(x)
but I do not understand what the ∘ - symbol stands for. Can someone help me? Thanks!
(it’s here, about halfway through http://neuralnetworksanddeeplearning.com/chap4.html)

UPDATE: think I found it out, it would be the composition, i.e. g∘f(x) means g(f(x)). we never used that symbol at university.

tmu · January 14, 2017, 10:54pm

I stumbled on the same issue, but then decided to go forward and adapt the lesson and use vgg16bn (it is in course’s github repository). Basically with vgg16bn you get weights like in vgg16 and then you can start to finetune and retrain the latter layers.

matt.mcclean · January 14, 2017, 11:07pm

OK, thanks. I will try the same.

devsp · January 15, 2017, 10:05am

Hi, I have a question in the Batch Normalization section. Why do we do this?:

for l in bn_model.layers: 
    if type(l)==Dense: l.set_weights(proc_wgts(l, 0.3, 0.6))

I mean, according to what I understood, this line assumes you previously had 0.3 as “p” in your Dropout layers and you want to “migrate” the different weights to 0.6. But in the notebook we basically had .0 as p in the previous section, so I’m a bit lost.

Also, I still don’t understand the need of .proc_wgts(layer, prev_p, new_p). I mean, the opposite I understand, when you had a Dropout with p > 0 and you are setting p = 0 (because you need to drop some activations to copy them to the new model). But when you have p = 0 and you want p > 0 I still don’t get it, because you didn’t get rid of any information before

Thanks in advance!

zaiddabaeen · January 15, 2017, 12:43pm

Thanks. It seems my model already had BatchNormalization (it seems the code has changed since the writing of that python notebook).

Works great now!

zaiddabaeen · January 15, 2017, 12:55pm

Got an error at fc_model.predict

ValueError: Error when checking : expected maxpooling2d_input_1 to have shape (None, 512, 14, 14) but got array with shape (64, 3, 224, 224)

fc_model.summary()
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
maxpooling2d_6 (MaxPooling2D) (None, 512, 7, 7) 0 maxpooling2d_input_1[0][0]
____________________________________________________________________________________________________
flatten_2 (Flatten) (None, 25088) 0 maxpooling2d_6[0][0]
____________________________________________________________________________________________________
dense_5 (Dense) (None, 4096) 102764544 flatten_2[0][0]
____________________________________________________________________________________________________
batchnormalization_3 (BatchNorma (None, 4096) 16384 dense_5[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (None, 4096) 0 batchnormalization_3[0][0]
____________________________________________________________________________________________________
dense_6 (Dense) (None, 4096) 16781312 dropout_3[0][0]
____________________________________________________________________________________________________
batchnormalization_4 (BatchNorma (None, 4096) 16384 dense_6[0][0]
____________________________________________________________________________________________________
dropout_4 (Dropout) (None, 4096) 0 batchnormalization_4[0][0]
____________________________________________________________________________________________________
dense_7 (Dense) (None, 2) 8194 dropout_4[0][0]
====================================================================================================
Total params: 119,586,818
Trainable params: 119,570,434
Non-trainable params: 16,384

model.summary()
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
lambda_1 (Lambda) (None, 3, 224, 224) 0 lambda_input_1[0][0]
____________________________________________________________________________________________________
zeropadding2d_1 (ZeroPadding2D) (None, 3, 226, 226) 0 lambda_1[0][0]
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D) (None, 64, 224, 224) 1792 zeropadding2d_1[0][0]
____________________________________________________________________________________________________
zeropadding2d_2 (ZeroPadding2D) (None, 64, 226, 226) 0 convolution2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 64, 224, 224) 36928 zeropadding2d_2[0][0]
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D) (None, 64, 112, 112) 0 convolution2d_2[0][0]
____________________________________________________________________________________________________
zeropadding2d_3 (ZeroPadding2D) (None, 64, 114, 114) 0 maxpooling2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D) (None, 128, 112, 112) 73856 zeropadding2d_3[0][0]
____________________________________________________________________________________________________
zeropadding2d_4 (ZeroPadding2D) (None, 128, 114, 114) 0 convolution2d_3[0][0]
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D) (None, 128, 112, 112) 147584 zeropadding2d_4[0][0]
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D) (None, 128, 56, 56) 0 convolution2d_4[0][0]
____________________________________________________________________________________________________
zeropadding2d_5 (ZeroPadding2D) (None, 128, 58, 58) 0 maxpooling2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D) (None, 256, 56, 56) 295168 zeropadding2d_5[0][0]
____________________________________________________________________________________________________
zeropadding2d_6 (ZeroPadding2D) (None, 256, 58, 58) 0 convolution2d_5[0][0]
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D) (None, 256, 56, 56) 590080 zeropadding2d_6[0][0]
____________________________________________________________________________________________________
zeropadding2d_7 (ZeroPadding2D) (None, 256, 58, 58) 0 convolution2d_6[0][0]
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D) (None, 256, 56, 56) 590080 zeropadding2d_7[0][0]
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D) (None, 256, 28, 28) 0 convolution2d_7[0][0]
____________________________________________________________________________________________________
zeropadding2d_8 (ZeroPadding2D) (None, 256, 30, 30) 0 maxpooling2d_3[0][0]
____________________________________________________________________________________________________
convolution2d_8 (Convolution2D) (None, 512, 28, 28) 1180160 zeropadding2d_8[0][0]
____________________________________________________________________________________________________
zeropadding2d_9 (ZeroPadding2D) (None, 512, 30, 30) 0 convolution2d_8[0][0]
____________________________________________________________________________________________________
convolution2d_9 (Convolution2D) (None, 512, 28, 28) 2359808 zeropadding2d_9[0][0]
____________________________________________________________________________________________________
zeropadding2d_10 (ZeroPadding2D) (None, 512, 30, 30) 0 convolution2d_9[0][0]
____________________________________________________________________________________________________
convolution2d_10 (Convolution2D) (None, 512, 28, 28) 2359808 zeropadding2d_10[0][0]
____________________________________________________________________________________________________
maxpooling2d_4 (MaxPooling2D) (None, 512, 14, 14) 0 convolution2d_10[0][0]
____________________________________________________________________________________________________
zeropadding2d_11 (ZeroPadding2D) (None, 512, 16, 16) 0 maxpooling2d_4[0][0]
____________________________________________________________________________________________________
convolution2d_11 (Convolution2D) (None, 512, 14, 14) 2359808 zeropadding2d_11[0][0]
____________________________________________________________________________________________________
zeropadding2d_12 (ZeroPadding2D) (None, 512, 16, 16) 0 convolution2d_11[0][0]
____________________________________________________________________________________________________
convolution2d_12 (Convolution2D) (None, 512, 14, 14) 2359808 zeropadding2d_12[0][0]
____________________________________________________________________________________________________
zeropadding2d_13 (ZeroPadding2D) (None, 512, 16, 16) 0 convolution2d_12[0][0]
____________________________________________________________________________________________________
convolution2d_13 (Convolution2D) (None, 512, 14, 14) 2359808 zeropadding2d_13[0][0]
____________________________________________________________________________________________________
maxpooling2d_5 (MaxPooling2D) (None, 512, 7, 7) 0 convolution2d_13[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten) (None, 25088) 0 maxpooling2d_5[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 4096) 102764544 flatten_1[0][0]
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 4096) 16384 dense_1[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0 batchnormalization_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 4096) 16781312 dropout_1[0][0]
____________________________________________________________________________________________________
batchnormalization_2 (BatchNorma (None, 4096) 16384 dense_2[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 4096) 0 batchnormalization_2[0][0]
____________________________________________________________________________________________________
dense_4 (Dense) (None, 2) 8194 dropout_2[0][0]
====================================================================================================
Total params: 134,301,506
Trainable params: 8,194
Non-trainable params: 134,293,312
____________________________________________________________________________________________________

My fc_layers before compiling new model:

maxpooling2d_5 (None, 512, 14, 14) (None, 512, 7, 7)
flatten_1 (None, 512, 7, 7) (None, 25088)
dense_1 (None, 25088) (None, 4096)
batchnormalization_1 (None, 4096) (None, 4096)
dropout_1 (None, 4096) (None, 4096)
dense_2 (None, 4096) (None, 4096)
batchnormalization_2 (None, 4096) (None, 4096)
dropout_2 (None, 4096) (None, 4096)
dense_4 (None, 4096) (None, 2)

Hence my get_fc_model function:

def get_fc_model():
    model = Sequential([
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dense(4096, activation='relu'),
        BatchNormalization(),
        Dropout(0.),
        Dense(4096, activation='relu'),
        BatchNormalization(),
        Dropout(0.),
        Dense(2, activation='softmax')
        ])

    for l1,l2 in zip(model.layers, fc_layers): l1.set_weights(proc_wgts(l2))

    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

Where did I go wrong?

The convolution layer is index 30 and does have the output 512, 14, 14. Where is 64, 3, 224, 224 coming from?

tmu · January 15, 2017, 2:40pm

You are passing image data to fc layers. You should pass it through convolution layers first to get “features” and pass these to fc model. it’s in the notebook.

zaiddabaeen · January 15, 2017, 3:04pm

Excuse me but I do not think I understand. When we created the fc_model, we flattened the conv layers, and added more FC layers. We trained it on our cats/dogs data and saved the weights. Isn’t the fc_model now the full model? I cannot seem to find it in the notebook. Should we remove the flattened layer and insert the conv layers now instead?

EDIT: Okay seems yes that is what I ought to do like in final_model and bn_model.

Question: Why do we retrain the model even though we do have the weights for conv and fc layers? And why do I have terrible results:

final_model = Sequential(conv_layers)
for layer in final_model.layers: layer.trainable = False
for layer in fc_model.layers: final_model.add(layer)
final_model.compile(optimizer=Adam(), 
                    loss='categorical_crossentropy', metrics=['accuracy'])
final_model.fit_generator(batches, samples_per_epoch=batches.nb_sample, nb_epoch=1, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)
Epoch 1/1
25000/25000 [==============================] - 716s - loss: 0.8436 - acc: 0.9217 - val_loss: 5.8600 - val_acc: 0.3890

oregontrail256 · January 15, 2017, 10:38pm

Hi, friendly request for help!

I’m working on the State Farm competition. When training my fully-connected model (fc_model), I get terrible results–much worse than the accuracy previously obtained by the full fine-tuned VGG16 model. Can anyone help me identify why my FC model is doing so poorly, when it uses the same weights as my VGG model’s fully-connected layers, and uses the proper training inputs from the conv model?

Steps:
I trained my vgg model to get 55% accuracy on the validation set. Then I split the model into the conv_model and fc_model, and evaluated the conv_model on the training/validation batches to get training/validation features, which serve as inputs into the fc_model. The fc_model is built using the fully-connected layers of VGG, with the same weights. But when I then run the fc_model, I get really bad accuracy, around 10%. I would expect starting accuracy to be ~55% since I copy the weights over.

# At this point, vgg is trained to give 55% validation accuracy
# Get last conv layer
conv_layers = [index for index, layer in enumerate(vgg.model.layers) if type(layer) is convolutional.Convolution2D]
last_conv_layer_index = max(conv_layers)

# Create conv_model and save fc_layers
conv_model = Sequential(vgg.model.layers[:last_conv_layer_index+1])
fc_layers = vgg.model.layers[last_conv_layer_index+1:]

# Calculate features from train/ and valid/
train_features = conv_model.predict_generator(batches, batches.nb_sample)
valid_features = conv_model.predict_generator(val_batches, val_batches.nb_sample)

# Generate one-hot labels for train and validation
train_classes = batches.classes
train_labels = utils.onehot(train_classes)
valid_classes = val_batches.classes
valid_labels = utils.onehot(valid_classes)

def get_fc_model(lr=0.0001, dropout=0.5):
    model = Sequential([
            MaxPooling2D(input_shape=conv_model.layers[-1].output_shape[1:]),
            Flatten(),
            Dense(4096, activation='relu'),
            Dropout(dropout),
            Dense(4096, activation='relu'),
            Dropout(dropout),
            Dense(len(categories), activation='softmax')
        ])

    # Copy weights from old fc_layers
    model_layers = [layer for index, layer in enumerate(model.layers)]
    for old_layer, new_layer in zip(fc_layers, model_layers):
        new_layer.set_weights(old_layer.get_weights())
    
    model.compile(optimizer=Adam(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

fc_model = get_fc_model()
fc_model.fit(train_features, train_labels, nb_epoch=1, validation_data=(valid_features, valid_labels))

# RESULT:
Train on 5000 samples, validate on 1000 samples
Epoch 1/1
5000/5000 [==============================] - 15s - loss: 2.3584 - acc: 0.0976 - val_loss: 2.3037 - val_acc: 0.1010

As you can see, the validation accuracy is 10%–much worse than the 55% I was getting on full VGG. Can anyone help me?

jeremy · January 16, 2017, 12:49am

What’s in the notebook isn’t always exactly what I did - since I sometimes go back and change and rerun things. In this case, I had trained an earlier model with a dropout of 0.3, and later wanted to change it to 0.6 without retraining.

jeremy · January 16, 2017, 12:51am

We’re fine-tuning more layers for this model. As the video mentions, this task is very different to the imagenet task, so the imagenet weights aren’t as useful - so we have to fine-tune more layers.

jeremy · January 16, 2017, 12:53am

Sounds like a bug in your code - if you have copied the weights over correctly and used the correct conv layer features as your input, you should get identical results. You can test this by running fc_model.evaluate(…) before fitting it.

If evaluate() gives the expected answer, then you must have too high a learning rate.