Hi,
I’m trying to solve a facial recognition problem with the dataset provided in
http://www.kasrl.org/jaffe.html
In order to simplify the problem, I considered only two classes, neutral and “the rest” (the rest actually consists of six different classes: sad, angry…).
I think, the main problem is that the number of samples is really small (213, overall). Is this even feasible with CNNs? I tried reusing VGG16 (replicating the grayscale channel into RGB) with no luck, but I guess this problem is quite different from the one VGG16 was meant to solve (grayscale, and just faces).
I also tried an ad-hoc model along the lines of VGG16
model = Sequential()
model.add(BatchNormalization(input_shape=(1, 224, 224)))
model.add(Conv2D(8, (3, 3), activation='relu', padding='same', data_format=data_format))
model.add(MaxPooling2D((3, 3), strides=(3, 3), data_format=data_format))
model.add(Conv2D(16, (3, 3), activation='relu', padding='same', data_format=data_format))
model.add(MaxPooling2D((2, 2), strides=(3, 3), data_format=data_format))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', data_format=data_format))
model.add(MaxPooling2D((2, 2), strides=(3, 3), data_format=data_format))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', data_format=data_format))
model.add(MaxPooling2D((2, 2), strides=(3, 3), data_format=data_format))
model.add(Flatten())
model.add(Dense(2, activation='softmax'))
The problem is, while training, the loss is going up and down most of the time (although a decreasing trend can be observed). After 100 epochs, in a certain run, I get
loss: 0.1450 - acc: 0.9444 - val_loss: 0.1556 - val_acc: 0.9750
but another run may give different (worse) results. It seems the performance is very dependent on the initialization. What should I make of this? (the training loss going up and down during training, and different runs giving inconsistent results)? I would think that a model with 25,986 parameters would always be able to overfit the training set (here, 224x224 151 images…after splitting into training-validation-test) and yield 100% accuracy on the training set (although, probably a poor validation accuracy).
Any thoughts on this?
Cheers.