Lesson 2 discussion

Hey guys,

I’m following Jeremy’s solution for submitting the solution. Everything is fine up to the point where I run the fit

vgg.fit(batches, val_batches, nb_epoch=1)

When I run this line I get the following exception:
Exception: The model needs to be compiled before being used.

Has anyone seen this before?

1 Like

could anyone tell me what is the usage of vgg_mean in vgg16bn source?

It is used for input normalization. Input normalization is a common practice in machine learning. Typically you get inputs, subtract mean value and divide by standard deviation. See this Wikipedia article for details. Motivation: if you don’t do it, some of your inputs can be very large, and it leads to very large values of activation functions, your gradients will be large and it makes the model harder to train.
In our case we don’t need to divide by standard deviation because our inputs are in a constrained interval from 0 to 255, so we can just subtract mean values. vgg_mean is an array of mean values for every channel of ImageNet dataset.
Jeremy explains it in more details in Lesson 3.

Hey, so I kept trying to implement the code for this lesson by myself and the results are still disastrous.
Here is the relevant code

vgg = Vgg16()
model = vgg.model

model.pop()
for layer in model.layers: layer.trainable=False
model.add(Dense(2, activation='softmax'))

img_width, img_height = 224, 224
gen=image.ImageDataGenerator()
batches = gen.flow_from_directory(path+'train', target_size=(img_width, img_height), batch_size=batch_size, shuffle=True)
val_batches = gen.flow_from_directory(path+'valid', target_size=(img_width, img_height), batch_size=batch_size, shuffle=False)

opt = RMSprop(lr=0.1)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

fit_model(model, batches, val_batches, nb_epoch=2)
which outputs val_loss: 0.3922 - val_acc: 0.97
probs = model.predict_generator(val_batches, val_batches.nb_sample)

However I keep getting

The reason for me to try and run the code using flow_from_directory and predict_generator is because running the code with the get_data function provided results in a “Cannot allocate memory” error. However these two functions give me bad results. I have read the documentation for these functions and cannot find what I am doing wrong. I’d be glad if someone could shed some light on this. Thanks in advance!

Francisco

Hi all,

I have a quick question regarding the differences between using Theano and Tensorflow backend, which may or may not be answerable!

So, I’ve run the exact same model using both backends, but the end results are quite different, with the 97.7% accuracy using Theano, but only around 91% - 92% using Tensorflow. I’m using the latest packages available for both backends, and was just wondering if there was a (relatively) simple answer as to why there is such a difference in results, when training the same model using the same data?

Cheers,

Paul.

3 Likes

One possible explanation was provided in Lesson 1 forum thread.
Theano and TensorFlow implement convolutional layers in different ways, and if weights for a Theano model are loaded into TensorFlow model, weights shall be converted.
I didn’t check if this conversion helps to get to 97-98% accuracy on TensorFlow. Please tell us about results if you try it.

1 Like

Hi,

This might be a straightforward question to most with a stronger coding background, but I was wondering how Jeremy determined the folder structure that he needed to run his dog-cat redux code (see screenshot below)? I understand that this is the structure needed to run vgg16() but would be great if someone could point me to information on why the information needs to be structured this way. Thanks!

Hey @Codegnosis

I was looking to get a PC of similar specs to utilize instead of AWS and had a few questions if you didn’t mind :slight_smile:

How does it fair comparatively, or would you have any advice regarding setting it up out of the box?

What is your setup/ environment with this PC?

Thanks in advance,

Ian

you can refer to the below link for set up in windows(keras+theano+cpu)
http://wiki.fast.ai/index.php/Local_install_(Windows_only:cpu)

Hi Ian,

Sure no problem. I use Linux (not used Windows for about 17 years), and the setup was pretty straightforward. For the Python environment, I use Anaconda 3, which sits on Python 3.5, so some tweaks are required to Jeremy’s code to get it working, but they are pretty trivial changes for the most part. I’m also using Keras 2, so some of the Keras API code in the notebooks and vgg16.py etc. also needs modifying (again, pretty trivial changes), and Tensorflow backend (no code changes required - it just needs a change in keras.json).

Setting up Cuda and cuDNN to work with Tensorflow took a little effort (not sure if you’re going to use Linux or Windows, but if you need some tips for getting it set up in Linux, please let me know), but nothing too difficult.

Regarding processing times, it fairs well - most of the model.fit calls in the Lesson 2 notebook took around 180s per epoch, with a batch size of 32.

Hope that helps, and please let me know if you have any specific environment setup questions,

Cheers,

Paul.

1 Like

Thanks Paul! That gives me a good primer before I decide to make a switch when I get deeper into it.

Ian

@ltshan Thank you!

I have a question regarding vgg model. I looked at the summary of the model and the last layer is just a Dense layer. So it basically does matrix multiplication on evaluation. How come the output values from it are always between 1 and 0 and never above 1 or below 0? mathematically, why does this happen?

Another question, what does it mean for softmax to be an activation function. In the standford notes, they talk about softmax as a loss function. Not sure what it means for it to be an activation function.

Thanks for sharing this code! very useful.

I face the same issue . Did you get any solution for this?

‘ft’ function is adding a new Dense layer using this code:

model.add(Dense(num, activation='softmax'))

Note that we set a value for “activation” parameter here. It adds a softmax activation function after the matrix multiplication. The result of the softmax function is a vector of non-negative values that sum to 1.

Thanks for sharing this code.

Are you then calling the flow method like this?

batches = gen.flow(trn_features, trn_labels, batch_size=batch_size, shuffle=True)

Or am I missing something?

Why does training error after the 5th epoch(final epoch) , differ from error obtained from lm.evaluate() ? I suppose they should be the same as the final weights are determined by the final epoch. What might be causing the difference?

I’ve had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.

Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*

Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it’s predicting all one class.

Fixes/Checks (in somewhat of an order):

  • Double Check Model Architecture: use model.summary(), inspect the model.
  • Check Data Labels: make sure the labelling of your train data hasn’t got mixed up somewhere in the preprocessing etc. (it happens!)
  • Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
  • Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
  • Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)

If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I’ll try to update the list.

**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained “enough”

*Other names for the issue to help searches get here…
keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output

1 Like