Lesson 3 discussion

There’s not really enough info there to tell - could you create a gist with your whole notebook?

Thanks. Here it is: https://gist.github.com/javedqadruddin/c491c00b109abb8cfed9c06d7545dd63

@jeremy any idea what could be going wrong here? Thanks!

You didn’t pop the last layer and replace it with one with the correct number of outputs, prior to creating conv_model. I’m not sure why it is happening - but my guess is that that’s the cause. Keras often gets confused when copying layers between models - recently I’ve started writing code that instead copies the layer config and weights separately.

1 Like

Got it working. Copying the config and the weights separately did the trick. Thanks!

2 Likes

Could someone help me understand why we had to halve the weights?

Also does this half the weights of the corresponding fc layers only or all the model’s layers… i believe model will have a lot more layers than fc_layers

def get_fc_model():
model = Sequential([
MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
Flatten(),
Dense(4096, activation=‘relu’),
Dropout(0.),
Dense(4096, activation=‘relu’),
Dropout(0.),
Dense(2, activation=‘softmax’)
])

for l1,l2 in zip(model.layers, fc_layers): l1.set_weights(proc_wgts(l2))

model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
4 Likes

When you remove dropout, you are no longer zeroing-out half of the previous layer’s weights. Therefore the total weights of the previous layer are (approximately) twice what they were before. So to make the next layer’s weights continue to work, you’ll need to halve the weights of the layer that used to have dropout.

This only need be done for the layers that have had dropout removed (or adjusted).

2 Likes

ok… but given that dropout only randomly zeros out the weights, we might still be doubling some of these weights … isnt it?

Plus if drop if say 0.8, 80% of the weights will be dropout will be removed so its just not doubling the weights it … it will be much higher but by setting o/2, we are always just halving it which wont restore the effect on removed dropouts.

def proc_wgts(layer): return [o/2 for o in layer.get_weights()]

Does that make any sense? Sorry its hard to explain what i am trying to say.

1 Like

proc_wgts() specifically works for removing dropout layers with p=0.5. So by definition, on average half of those weights are being zeroed out.

Is making conv layers nontrainable same as creating an FC model with just FC layers and running fit_generator on it with conv output? I am going over lesson 3 notebook, and I am trying to understand when are layers made nontrainable.

After data augmentation, conv layers are made nontrainable. Why is that?
> for layer in conv_model.layers: layer.trainable = False

Somehow, my intuition says making some layers nontrainable is sub-optimal for accuracy, as those layers do not get backpropagation benefits? Or am I misunderstanding something?

A few questions about BatchNormalization:

  1. I see that we are retraining the VGG model here as we think training 120m parameters using 20K images is not a good idea, isn’t that same when we added/updated other layers like DropOut?
  2. I don’t follow the purpose of this code block, why are we adjusting weights for all dense layers? And what does 0.3 and 0.6 signify?

def proc_wgts(layer, prev_p, new_p):
scal = (1-prev_p)/(1-new_p)
return [o*scal for o in layer.get_weights()]

for l in bn_model.layers:
if type(l)==Dense: l.set_weights(proc_wgts(l, 0.3, 0.6))

We are setting them not to learn (i.e., update their weights) because we do not need to, and thus save a bit of computation, since they have already learned lower-level features like edges, shapes, and similar objects (e.g., animals) from being trained in the ImageNet dataset.

1 Like

Isnt that true for when we add Batch normalization as well?

Yes. The latter is preferred since it’s much faster, if you’re doing more than a couple of epochs.

That’s certainly true - but the early layers are so general (e.g. remember Zeiler’s visualizations - layer 1 just finds edges and gradients) that it’s extremely unlikely that you’ll need to change them, unless you’re looking at very different kinds of images. e.g. if you’re classifying line drawings, instead of photos, you’ll probably need to retrain many conv layers too.

1 Like

Sorry I’m not following this question - can you clarify?

This is for when we have a pre-trained model that used a different amount of dropout to what we wish to use for our model. e.g. if the pretrained model used p=0.5, and we wish to use p=0.0, we’ll need to double all the weights. In this case, I took a pretrained model that used p=0.3, and wished to change it to p=0.6

NVM – I now have the path pointing to ft2.h5 in /results which appears to have legs – thx!!

I’m working through the Lesson 3 Notebook and I’m hitting an “Exception: You are trying to load a weight file containing 17 layers into a model with 16 layers.” error when running model.load_weights(model_path+‘finetune3.h5’) I tried changing finetune3.h5 to finetune2.h5 and finetune1.h5 but still received the same error. Searching the forums and the web didn’t yield any results either. Any ideas??

Exception Traceback (most recent call last)
in ()
----> 1 model.load_weights(model_path+‘finetune3.h5’)

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in load_weights(self, filepath, by_name)
2498 self.load_weights_from_hdf5_group_by_name(f)
2499 else:
-> 2500 self.load_weights_from_hdf5_group(f)
2501
2502 if hasattr(f, ‘close’):

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in load_weights_from_hdf5_group(self, f)
2550 ‘containing ’ + str(len(layer_names)) +
2551 ’ layers into a model with ’ +
-> 2552 str(len(flattened_layers)) + ’ layers.’)
2553
2554 # we batch weight value assignments in a single backend call

Exception: You are trying to load a weight file containing 17 layers into a model with 16 layers.

2 Likes

Steve,

some things you can check. if you use model.summary() it should show you the current layers in the model. i think vgg16 has 16 layers, hence the weights file should have 16 layers.

looks like there may be an extra layer attached to your model which generated the weights?

model.pop() would remove this, but it might be worth checking what the extra layer is. the finetuned vgg16 model for dogs/cats should have a last dense layer with a softmax activation and an output of 2 categories, which should replace the original layer which contained a similar layer but with 1000 categories.

hope that helps.

2 Likes

First of all thanks for this wonderful course Jeremy. Only wish this course had come out a bit earlier. The 3 days spent watching your videos were more informative than 4 months of fiddling around with Keras. :slight_smile: Appreciate all best practices/tricks of trade that you shared.

VGG-16 had worked very well in my last project of classifying building photos. Now I am starting on classifying localities in GIS (MapInfo) based on the type & density of the buildings in each locality. So the input would be screen captures from the GIS. I am guessing in this case I would have to make the convolutional layers also trainable. Or would I be better of training a small convnet from scratch.

Working on the data collection right now. I am guessing I would have around 100 screen captures/class prior to augmentation for training & validation.

Thanks
Satish

Correct me if I’m wrong, in the lesson 3 notebook fc_layers has BatchNormalization layers while the new model inside get_fc_model() does not. Doesn’t it mean that the weight copying process inside get_fc_model() is bound to fail?

You might be right - we switched in lesson 4 to adding batchnorm to VGG, and went back and changed some notebooks. Apologies if there’s some inconsistencies still!

1 Like

So I have 2 questions which look stupid but I cant seem to understand

  • For the mean (vgg_mean) that we deduct from each element of the array ,cant we just use 128 since we know it will always be within (0,255) ?

  • So I see we change from RGB to BGR ? What is the reason for doing so ? I read that OpenCV uses BGR for reading image files , but i don’t see any usage of openCV. Am I missing something ?