My dogs vs cats models always have 0.5 accuracy - what's wrong?

After I fiddled around with the Vgg16 model provided by Jeremy, I came to the conclusion that I was just playing around with the knobs, and not really understanding the nuts and bolts. So I figured, the best ay to learn is I’d try to create my own model in Keras for cats&dogs redux.

my first model looks like this:

model = Sequential([
    Flatten(input_shape=(3,224,224)),
    Dense(100),
    Activation('relu'),
    Dense(2),
    Activation('softmax')
])
model.compile(optimizer='rmsprop',
          loss='categorical_crossentropy',
          metrics=['accuracy'])

The result:

model.fit_generator(generator=train_batches, 
                samples_per_epoch=train_batches.nb_sample, 
                validation_data=val_batches, nb_val_samples=val_batches.nb_sample,
                nb_epoch=1)
Epoch 1/1
22500/22500 [==============================] - 267s - loss: 8.0619 - acc: 0.4998 - val_loss: 8.2267 - val_acc: 0.4896

So I tried another simple model, this time with a convolutional layer.

model = Sequential([
Convolution2D(64,3,3, input_shape=(3,224,224)),
Activation('relu'),
Flatten(),
Dense(2),
Activation('softmax')])

Again, same thing: accuracy 0.5. During training, accuracy always hovers around 0.5, and doesn’t improve. Training more epochs doesn’t help.

I also tried the vgg-like convnet example from the Keras Getting started documentation ( https://keras.io/getting-started/sequential-model-guide/#examples ).
Here you see the same effect: during trainig the accuracy circles around 0.5, and never improves.
I tried changing the optimizer and changing the learning rate. But I always get a similar result.

None of the models ever improve, and always converge on an accuracy of 0.5. So clearly, I must be doing something wrong. But I can not figure out what.

Anybody have any clue?

Tim

What does get_batches return? Found x images belong to how many classes?

get_batches looks OK:

Found 22500 images belonging to 2 classes.
Found 2500 images belonging to 2 classes.

Ok, sorry, this will sound silly, but under the train and validation directories you have two directories, one that has only images of cats and the other one only images of dogs, right?

I doubt the other hypothesis as well but nothing else comes to my mind so here it goes… Could you please in model.compile, for the optimizer, use keras.optimizers.Adam(lr=1e-6)? Other than that you might also want to increase the number of nodes in the dense layer.

I can’t test it right now but everything looks okay so the ideas I have are a bit of a longshot.

Guessing it must be an issue with the optimization algorithm / learning rate (that is assuming nothing rather unusual is going on line having randomly distributed cat / dog pictures across folders, etc ;))

As a reference point, this very simple model got me to 0.6 val loss and 0.68 loss on the training set:

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=(3,224,224)))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(2, W_regularizer=keras.regularizers.l2(0.02)))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Activation('softmax'))

model.compile(keras.optimizers.Adam(lr=1e-5), 'categorical_crossentropy', metrics=['accuracy'])

And that is on a very tiny subset of the data:

Train on 500 samples, validate on 250 samples
Epoch 1/10
0s - loss: 0.7967 - acc: 0.5500 - val_loss: 8.4732 - val_acc: 0.4560
Epoch 2/10
0s - loss: 0.7569 - acc: 0.5620 - val_loss: 8.3480 - val_acc: 0.4600
Epoch 3/10
0s - loss: 0.7370 - acc: 0.6080 - val_loss: 7.4428 - val_acc: 0.4680
Epoch 4/10
0s - loss: 0.7220 - acc: 0.6560 - val_loss: 5.5043 - val_acc: 0.4880
Epoch 5/10
0s - loss: 0.7119 - acc: 0.6640 - val_loss: 4.1466 - val_acc: 0.5560
Epoch 6/10
0s - loss: 0.7049 - acc: 0.6740 - val_loss: 3.5068 - val_acc: 0.5680
Epoch 7/10
0s - loss: 0.6995 - acc: 0.6760 - val_loss: 3.1914 - val_acc: 0.5760
Epoch 8/10
0s - loss: 0.6948 - acc: 0.6860 - val_loss: 3.0288 - val_acc: 0.5840
Epoch 9/10
0s - loss: 0.6903 - acc: 0.6920 - val_loss: 2.9563 - val_acc: 0.5880
Epoch 10/10
0s - loss: 0.6856 - acc: 0.6900 - val_loss: 2.9640 - val_acc: 0.6040

Thus a dense layer with 100 nodes should plenty. Which I guess would support the hypothesis there likely could be an issue with the optimizer / lr.

1 Like

First off: I hope it’s something silly :slight_smile:

  • directory structure: check. only cat pictures in train/cat and valid/cat, only dog pictures in train/dog and valid/dog.
  • tried the Adam optimizer before, and that didn’t help either.
  • tried increasing/decreasing the learning rate
  • I trained (finetuned) Jeremy’s Vgg16 model on my batches using the two lines from lesson 1, and that works fine. I conclude my input is not the problem.

What really baffles me is that during training, the accuracy for every batch is always around .5, right from the start. No matter which of my own models! The major difference with the lesson 1 Vgg16 model is

  • only the last layer gets trained, all other layers already have their weights precalculated.
  • the input layer is preprocessed (center around mean and re-arrange color channels)

I tried adding this preprocessing, but that didn’t help. So the optimizer & loss function look like the next things to fiddle around with?

Tim

Yes, I would give optimizer and loss a shot :slight_smile: If that doesn’t help - unless someone else has a better idea - the next step would probably be creating as simple a script as possible that reproduces the problem and sharing here.

You could also try doing model.summary() - maybe there is something unusual in layer arrangements that got screwed up when you were attempting fine tuning.

for layer in model.layers:
print(layer, layer.trainable)

The above should confirm to you that indeed you have trainable layers in your model. If all layers were initialized randomly and non were being trained, you should be getting results bouncing a little bit up and down 0.5 accuracy.

@TimW

My bet is you’ve mangled the dimensions in your convolutional layer, using tf ordering for Theano or vice versa.

Check the model input and outputs of the first convolutional layer (maybe set one filter to all 1s first if using random weights - will give you a blurry image if working well).

I tested your network. It is giving similar results to me.

Maybe start with MNIST and check that it works?

@radek Thanks for your help so far! I’ve tried your model, and it trains better (training accuracy improves over epochs). Validation accuracy is still around 0.5.

@torkku A few months back, I briefly played around with Tensorflow and MNIST when I was trying to take the free Udacity course. Then I found @jeremy 's course and it’s a lot more informative, so one of the things I was going to try next is to see if I could reproduce my Tensorflow results on MNIST in Keras.

The general problem I’m having with deep learning is that I understand (up to a point) how it works, but not why. I’d like to get at least a basic understanding of how you get to a certain architecture, and how to interpret the results of the training. I feel this is really necessary if you want to apply this technique to other types of problems.

@TimW Were you able to find a solution? I’m having similar issues.

I had a similar problem and eventually found out that I forgot to shuffle the input/training samples (doh!). So make sure you have a shuffle=True in your get_batches for the training samples. :slight_smile:

The general problem I’m having with deep learning is that I understand (up to a point) how it works, but not why. I’d like to get at least a basic understanding of how you get to a certain architecture, and how to interpret the results of the training. I feel this is really necessary if you want to apply this technique to other types of problems.

Stick with the course and you’ll get there. It takes a while for these complex subjects to sink in. Even now months after taking the course I find myself returning to lectures just to make sure I understand or googling alternative references to get a different point of view. Six months ago I was in exactly that boat and I’m starting to feel like it’s clicking. Persistence pays off. :slight_smile:

Hi, I’ve had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.

Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*

Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it’s predicting all one class.

Fixes/Checks (in somewhat of an order):

  • Double Check Model Architecture: use model.summary(), inspect the model.
  • Check Data Labels: make sure the labelling of your train data hasn’t got mixed up somewhere in the preprocessing etc. (it happens!)
  • Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
  • Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
  • Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)

If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I’ll try to update the list.

**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained “enough”

*Other names for the issue to help searches get here…
keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training only 0.5 accuracy only predicts one single class wont train model stuck on class model resetting itself between epochs

3 Likes

how is your train_batches defined? categorical or binary?

you are using a binary classification so i suggest you use sigmoid as activation because sigmoid is for this kind of tasks and binary_crossentropy for the loss

Thank you so much!!
I had the same problem and reading this article solved the problem.

The cause of the problem was the ‘softmax’ function in the last dense layer.

  • model.add(keras.layers.Activation(‘softmax’))

Since the classification is ‘binary’, the last activation function must be ‘sigmoid’.

I hope a researcher with the same problem can read this and solve this problem.

Another important point to note here is that if you’re loading images for binary classification using an ImageDataGenerator, it’s really important to use the argument

class_mode="binary",

I was following 03. Convolutional Neural Networks and Computer Vision with TensorFlow - Zero to Mastery TensorFlow for Deep Learning and used exactly the same code, except I forgot to add that line when loading the data. I got an accuracy stuck at 0.5. As soon as I added that line to the loader, everything worked fine and my accuracy increased with each epoch as expected. Hope that helps.