Lesson 3 discussion

(Stephen Lizcano) #101

I’m lost as to why Jeremy splits the model into conv_layers and fc_layers when removing dropout?

Wouldn’t it also work to just set the convolutional layers to not trainable and train the same model with no dropout?

Or is it because to change the dropout and the weights it’s easier to split the model?

Any feedback or thoughts appreciated. :slight_smile:


(Stephen Lizcano) #102

This is a good question amy…so I guess the step of halving the weights in lesson3.ipynb is unncessary and maybe detrimental to fitting?


@Dee Hi I see your using a python2 env within a anaconda3 install that’s confusing me. I realise they might not be what I think they are referring to. Regards

(Sean) #104

Hi Roger,

Yep it running python2.7 in a conda environment from a base anaconda 3.5 install, So everything is installed in the python2 sub environment, Its just another type of virtual env and allows you to seperate out all your python dependencies.

If you are using anaconda I would recommed you use a conda env for your work. If you are not using anaconda then use virtual env. It just good practice for python to keep it seperate.

(Sean) #105

Hi tmu,

Thanks for the suggestion, I tried this but it also didnt work. I think it still produced the same error.

Its possible that one of my libraires is too old or has been downgraded by anaconda.

(Jeremy Howard) #107

Yes, but much slower. We cover this issue quite a few times in the lessons so keep watching and you’ll get the idea! Basically, you want to precompute the convolutional layers, since they’re slow. Setting to non-trainable still requires them to be computed for every epoch!

(KyRie) #108



I have a question:

In the video at the 1:32:35 mark,Jeremy mentioned that augmentation should not be applied on the validation set. However at 1:54:50, the augmentation is also applied on the validation set. Am I missing something here?

Another question:
Here is a snippet of the Data Augmentation section of mnist.ipynb

gen = image.ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
height_shift_range=0.08, zoom_range=0.08)
batches = gen.flow(X_train, y_train, batch_size=64)
test_batches = gen.flow(X_test, y_test, batch_size=64)
lm.fit_generator(batches, batches.N, nb_epoch=1,
validation_data=test_batches, nb_val_samples=test_batches.N)

In my opinion, with the augmentation we have access to pretty much infinite number of different images.
Would that mean that the choice of batches.N as number of batches per epoch to be actually quite meaningless, and the same goes to the choice of test_batches.N?

In the absence of augmentations, I’m thinking batches.N/batch_size might be more appropriate, it feels like more like an epoch, except some images get sampled more than once and some don’t get sampled at all.

I use Python 3 and Keras 2.0.2, and batches.n returns 60000 for me, and I assume batches.N is the same.

I hope I am not speaking gibberish here!


Hi everyone,
I have a question about the code in lesson3.ipynb In[80]:

Why we need to set the bn_model’s weigths to bn_layers? In [79], we already added the bn_layers to the final model, but in[80], we set the bn_model’s weigths to bn_layers, I think it can do nothing to the final model right? So why would we do that?

Thanks for anwersing.



其实你所说的训练完模型,并不是真的“训练”了那个模型,vgg模型是已经被训练好的了,我们做的model.fit()是相当于用这个model来predict了一遍training set和valid set,然后对比一下这个prediction与原本的标签符合与否。



Question translation to English:
Hi guys,
I’ve fininsh training the vgg model for cats_dogs_redux_competition, and I save the weights and model, but I don’t know how to predict the test image, what should I do? Thanks.


@KyRie, Jeremy has shown how to do this in dogs_cats_redux.ipynb notebook. Please take a look at it.


@justinho, If you see in cell 78, bn_layers are created new. It doesn’t have any weights of trained bn_model before. I am not sure why he did this, but since it doesn’t have any weights he set the bn_model weights in cell 80.


@Manoj yes I know that, but I think this process may have no effect om final model right?


@justinho Since you trained the bn_model before, by setting the weights of bn_model to the new model, you are effectively creating the same model here. So final_model gives same outputs as bn_model, except we also added conv layers and built the full model .


So the final_model is bulilt by follow steps:

  1. stack the former layers use sequential( );
  2. generate the bn_layers;
  3. pop the final layer of bn_layers and add a dense layer. In this step, there’re not any weights in bn_layers, because it’s a whole new bn_layer.
  4. add the bn_layers to the final model.
  5. In cell 80, set the weights that the bn_model trained before to the bn_layer. And that’s what I don’t understand, what’s that for?

Tell me if I’m wrong.


@justinho your understanding is correct.

After the step 4 is completed, we added the bn_layers but they don’t have initial weights, since we created them new. We can again train the full model to get the proper weights, but since we already trained a bn_model before, we are just setting those weights to bn_layer.

After this, we can train the model again. But since we already set the weights of bn_model, we can converge much faster to optimal weights.


Got it, after set the weights to bn_layers, we can add it to the final model again, and it will converge much faster than that have no weights in bn_layers.Thanks @Manoj


According to the notes Batchnorm can give a 10 * speed improvement. Therefore I was disappointed to see my 11 minute epoch did not fall to 1 minute when I added Batchnorm. Under what circumstances does Batchnorm give 10 * speed improvement?

(Marco Muscat) #120

I think that faster here means that the model will improve in accuracy faster not speed.
So if for example your model was taking 10 epochs to get to 80% acc with batchnorm it could reach the same accuracy in 5.
If anything the speed on adding batchnorm will be slower as the gpu is doing (a little) more calculations :slight_smile:


Hi guys,
I’m working on the vgg16bn model, first I import the vgg16bn model, and then I separated the conv layers and fc layers, also changed the dropout.

After I fit the fc_model a few times, I want to add the fc_model to the conv_models, but it raised an error:

“You are attempting to share a same BatchNormalization layer across different data flows. This is not possible. You should use mode=2 in BatchNormalization, which has a similar behavior but is shareable (see docs for a description of the behavior).

How can I solve this problem? Any ideas would be grateful.