Lesson 3 discussion

xbno · October 3, 2017, 9:37pm

Hi all, I’m witnessing some funny behavior in my training val_loss and val_acc. It will wobble around all over the place instead of being consistent. I’ve tried a few architectures on sample redux data and it seems to be happening architecturally agnostic. My hypothesis is that it has something to do with the data_aug being too different from the val set, has anyone run into this before? Especially, where a val_loss of 7 is sandwiched between .13 and .47?

I’ve taken the vggbn model and kept the last 3 conv layers and 1st dense layer trainable. Then I’ve popped everything after that and added my own tail.

Epoch 1/10
125/125 [==============================] - 48s - loss: 0.4372 - acc: 0.8295 - val_loss: 0.3996 - val_acc: 0.9060
Epoch 2/10
125/125 [==============================] - 40s - loss: 0.1999 - acc: 0.9220 - val_loss: 1.0032 - val_acc: 0.7903
Epoch 3/10
125/125 [==============================] - 48s - loss: 0.1715 - acc: 0.9315 - val_loss: 0.2281 - val_acc: 0.9215
Epoch 4/10
125/125 [==============================] - 47s - loss: 0.1418 - acc: 0.9440 - val_loss: 0.1141 - val_acc: 0.9690
Epoch 5/10
125/125 [==============================] - 40s - loss: 0.1407 - acc: 0.9505 - val_loss: 0.6629 - val_acc: 0.8543
Epoch 6/10
125/125 [==============================] - 40s - loss: 0.1143 - acc: 0.9595 - val_loss: 0.5840 - val_acc: 0.9194
Epoch 7/10
125/125 [==============================] - 40s - loss: 0.1154 - acc: 0.9630 - val_loss: 0.1162 - val_acc: 0.9731
Epoch 8/10
125/125 [==============================] - 40s - loss: 0.0913 - acc: 0.9675 - val_loss: 0.1349 - val_acc: 0.9628
Epoch 9/10
125/125 [==============================] - 40s - loss: 0.0978 - acc: 0.9645 - val_loss: 7.4255 - val_acc: 0.5362
Epoch 10/10
125/125 [==============================] - 40s - loss: 0.0995 - acc: 0.9625 - val_loss: 0.4757 - val_acc: 0.8988

xbno · October 4, 2017, 4:14pm

A followup - I’ve tried modifying data augmentation which didn’t do anything but modifying the validation shuffle=False and lowering the learning rate seems to have at least made it more consistent and better performing.

Rhino · October 10, 2017, 2:32pm

I don’t think so - this paper talks about

“We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.”

Distilling the Knowledge in a Neural Network - https://arxiv.org/abs/1503.02531

gopi34 · October 19, 2017, 6:58am

Can somebody help me with this, I do not know why I’m getting this error ???

msgrasser · October 25, 2017, 4:20am

Also does this half the weights of the corresponding fc layers only or all the model’s layers… i believe model will have a lot more layers than fc_layers

Embarrassingly enough, I was stuck on this for a bit too until I looked about 10 lines up from there, where another var called model is declared within the scope of the get_fc_model function:

def get_fc_model():
    model = Sequential([
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dense(4096, activation='relu'),
        Dropout(0.),
        Dense(4096, activation='relu'),
        Dropout(0.),
        Dense(2, activation='softmax')
        ])

    for l1,l2 in zip(model.layers, fc_layers): l1.set_weights(proc_wgts(l2))

    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

It is confusing because a different variable called model is previously defined at the notebook level and holds the whole finetuned VGG model. Anyway, wanted to respond for completeness’ sake, but hopefully this will save someone a bit of time too.

ghadiya · October 28, 2017, 6:35pm

Hi,
In lesson 3 notebook vgg16bn is used while adding batchnorm instead of vgg16.
Why vgg16 can not be used? What is the basic difference between both of them.

Thanks & Regards

sirkal · October 30, 2017, 12:46am

I am having difficulty following the examples. The code in the video and lesson 3 notebook do not match and I am not sure to either follow the lesson 3 notebook or the imagenet_batchnorm notebook. And also if i have to follow the imagenet_batchnorm notebook, do i need to download the imagenet data as suggest under the solution section.

sirkal · October 30, 2017, 11:35am

Do i have to download the imagenet data for the cats and dogs batch normalization as suggested under the solution section in imagenet_batchnorm1 notebook.

Gatecrash · November 27, 2017, 4:30pm

Hi! I am having difficulties fine tuning the other, deeper, Dense Layers in VGG when trying to finish the State Farm Kaggle competition. I have fine tuned the last Dense Layer and would like to train the two other fully connected layers. Did this by using directly “vgg.finetune”.

Not much of a coder as I started learning Python like a bit over a month ago, so this might be a “silly” question, but I can’t figure out what I should do.

Do I need to import something from keras.models or keras.layers to make this work?

Or do I need to replace “first_dense_idx” with the index of the first dense layer? If so, how can I find out what the index is?

Gatecrash · November 27, 2017, 5:35pm

Solved it myself.

You have to first define layers = vgg.model and then layers = model.layers

And you get the index of layers by typing the following:

for i, layer in enumerate(model.layers):
      print(i, layer.name)

And then you can define which layers to set to trainable.

for example:

for layer in model.layers[33:]:
     layer.trainable = true

gnavink · November 29, 2017, 11:56am

hi @rachel ,

was referring the mnist notebook which provides an end to end model for doing regularization.
In creating a model with single hidden layer, jeremy uses an activation layer = ‘softmax’. The snapshot is shown below:

May i know why should softmax layer be used in an intermediate layer? My understanding is it should be used only in the last layer…

MarkD · December 11, 2017, 8:08pm

Hi, I am experiencing this same problem. I don’t understand what is meant here. How do I copy the config? Do I need to create the model from scratch, i.e. add all the layers in Keras, in that case how to handle vgg_preprocess?

Alternately I have tried simply using the existing model and deleting all layers after the first conv layer i.e.

num_del = len(layers) - last_conv_idx - 1
for i in range (0, num_del):  model.pop()
conv_model = model

(Note I create the FC model first so I don’t lose the fc weights). I would have thought that this would produce the correct conv model as the weights are intact. But when I run conv_model.predict_generator(batches, batches.nb_sample) the output shape is (23000,2). which will obviously not work as input for the fc model.

the last layers of the conv_model do have have the correct dimensions so i would expect output shape of (23000,512,14,14)
convolution2d_12 (Convolution2D) (None, 512, 14, 14) 0 zeropadding2d_12[0][0]

zeropadding2d_13 (ZeroPadding2D) (None, 512, 16, 16) 0 convolution2d_12[0][0]

convolution2d_13 (Convolution2D) (None, 512, 14, 14) 0 zeropadding2d_13[0][0]

Total params: 0

Thanks for any help

amil · December 24, 2017, 8:05am

Hello,
In the last part of Lesson 3, Ensembling was explained. However, If I remember correctly ensembling of different models is done when each individual model is different and performs well on different parts of data. However, the same architecture and parameters were used to make the 6 models. What is the difference between each model?

cgundlach13 · January 31, 2018, 1:31am

Hello, I’m having some issues finetuning the model. When I load my weights and set the dense layers to be trainable, I’m stuck at around .5 accuracy. I’m puzzled and not sure how to solve the problem. Does anybody know how to fix this?

saksham219 · April 23, 2018, 6:34pm

I am trying to build the satellite model in Keras. I have used pretrained VGG19 and the model on 64x64 images. Now I want to train the same model with 128X128 images just like it is done in this lesson. How should I go about this?

The problem is that while building the model in keras we have to specify a parameter with the size of the inputs

learningML89 · December 4, 2019, 6:27pm

Hi,

I’m working on lesson 3 and have gotten through to the installation of 7zip. I’m trying to unpack the data and get the following error

"ERROR: No more files
{path}

System ERROR:
Unknown error -2147024872
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors"

And when I try to run the first line of code in the multiclassification section, I get the error “name ‘pd’ is not defined”

Can anyone help me figure out what’s going on?

fernfialho · January 25, 2020, 3:19pm

I’m getting the same error:
System ERROR:
Unknown error -2147024872
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors"

Did anyone else run into this issue and find a solution?

fernfialho · January 31, 2020, 7:53pm

I was able to solve the issue by:

downloading the files to my machine and then uploading to Jupyter Notebook (takes a little while)
changed path to path = ‘/home/jupyter/tutorials/fastai/course-v3/nbs/dl1/data/planet/’
made sure that the file “train-jpg.tar.7z” was the one being unpacked by 7zip vs. “train_v2.csv.zip” (note that the csv file is being unziped by “! unzip -q -n {path}/train_v2.csv.zip -d {path}”)