Lesson 2 discussion

you can refer to the below link for set up in windows(keras+theano+cpu)
http://wiki.fast.ai/index.php/Local_install_(Windows_only:cpu)

Hi Ian,

Sure no problem. I use Linux (not used Windows for about 17 years), and the setup was pretty straightforward. For the Python environment, I use Anaconda 3, which sits on Python 3.5, so some tweaks are required to Jeremy’s code to get it working, but they are pretty trivial changes for the most part. I’m also using Keras 2, so some of the Keras API code in the notebooks and vgg16.py etc. also needs modifying (again, pretty trivial changes), and Tensorflow backend (no code changes required - it just needs a change in keras.json).

Setting up Cuda and cuDNN to work with Tensorflow took a little effort (not sure if you’re going to use Linux or Windows, but if you need some tips for getting it set up in Linux, please let me know), but nothing too difficult.

Regarding processing times, it fairs well - most of the model.fit calls in the Lesson 2 notebook took around 180s per epoch, with a batch size of 32.

Hope that helps, and please let me know if you have any specific environment setup questions,

Cheers,

Paul.

1 Like

Thanks Paul! That gives me a good primer before I decide to make a switch when I get deeper into it.

Ian

@ltshan Thank you!

I have a question regarding vgg model. I looked at the summary of the model and the last layer is just a Dense layer. So it basically does matrix multiplication on evaluation. How come the output values from it are always between 1 and 0 and never above 1 or below 0? mathematically, why does this happen?

Another question, what does it mean for softmax to be an activation function. In the standford notes, they talk about softmax as a loss function. Not sure what it means for it to be an activation function.

Thanks for sharing this code! very useful.

I face the same issue . Did you get any solution for this?

‘ft’ function is adding a new Dense layer using this code:

model.add(Dense(num, activation='softmax'))

Note that we set a value for “activation” parameter here. It adds a softmax activation function after the matrix multiplication. The result of the softmax function is a vector of non-negative values that sum to 1.

Thanks for sharing this code.

Are you then calling the flow method like this?

batches = gen.flow(trn_features, trn_labels, batch_size=batch_size, shuffle=True)

Or am I missing something?

Why does training error after the 5th epoch(final epoch) , differ from error obtained from lm.evaluate() ? I suppose they should be the same as the final weights are determined by the final epoch. What might be causing the difference?

I’ve had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.

Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*

Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it’s predicting all one class.

Fixes/Checks (in somewhat of an order):

  • Double Check Model Architecture: use model.summary(), inspect the model.
  • Check Data Labels: make sure the labelling of your train data hasn’t got mixed up somewhere in the preprocessing etc. (it happens!)
  • Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
  • Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
  • Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)

If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I’ll try to update the list.

**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained “enough”

*Other names for the issue to help searches get here…
keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output

1 Like

You are looking at the training loss, which is an average over the epoch (while the weights are changing).

If you add validation data to fit, the validation loss will also be computed at each epoch, but at the end of the epoch. The validation loss of the last epoch should be identical to what you get when running evaluate on the validation data.

Thank you @DBCerigo so much for summarizing. Very helpful. :clap:

Hi @jeremy , I have tensorflow running on my GPU. How do I make the utils.py run on my system, as it contains some theano imports.

Thanks,
Pradyumna

Hello,

First of all fantastic course. Thank you for sharing it to all.

I was trying to make predictions from test data after i retrain a couple of convolutional layers as mentioned at the end of lesson 2 notes. But i notice that this
test_path = 'data/redux/test/' test_data = get_batches(test_path)
is not detecting any images in the test folder at all. I checked on command line & there are test images in that folder. And the path provided is accurate. Snapshot of result attached.

Please let me know what i am missing here?

Also, just for clarification, once i get the test predictions, is this inclusive of the finetune() that vgg(16.py) has or should i be doing that separately for this model?

Can you run %pwd in a cell to see where you are? I suspect it’s because you are using a relative link (one that doesn’t start with a slash)

I know this is a little “late to the party”, but just in case, for those going through Lesson 2 and having the same memory issues, I have rewritten that particular section of the notebook using predict_generator(). This avoids loading all the images in memory to run the predictions.

You can always check the README.md for why I went this route.

HTH.

yes, spot on. Thanks alot.

I implemented the keras 2 version of cats vs dogs redux. It seems to be working OK but only getting ~89% accuracy max. I previously got the keras 1 version to 97% accuracy. Any ideas why I woudl get different results?
The Keras 2 is using the same weight files as 1.0.
Anyone getting Keras 2 to work with high accuracy? Which weights did you use?

Thanks

Jon