I am working on the Cervix cancer comp and built a sequential model in Keras by mimicking the statefarm-sample.ipynb. After playing with the architecture and some parameter settings, now the model has a lot of over-fitting.
The model looks like this:
mdl = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Convolution2D(32,3,3, activation='relu'),
MaxPooling2D((3,3)),
BatchNormalization(axis=1),
Convolution2D(64,3,3, activation='relu'),
MaxPooling2D((3,3)),
BatchNormalization(axis=1),
Flatten(),
Dense(256, activation='relu'),
BatchNormalization(),
Dense(128, activation='relu'),
BatchNormalization(),
Dense(3, activation='softmax')
])
Epoch 1/10
1185/1185 [==============================] - 404s - loss: 1.4565 - acc: 0.4667 - val_loss: 2.8802 - val_acc: 0.5534
Epoch 2/10
1185/1185 [==============================] - 310s - loss: 0.8629 - acc: 0.6160 - val_loss: 1.2412 - val_acc: 0.5221
Epoch 3/10
1185/1185 [==============================] - 310s - loss: 0.6719 - acc: 0.7418 - val_loss: 0.9542 - val_acc: 0.5187
Epoch 4/10
1185/1185 [==============================] - 308s - loss: 0.4703 - acc: 0.8532 - val_loss: 0.9490 - val_acc: 0.5003
Epoch 5/10
1185/1185 [==============================] - 311s - loss: 0.3258 - acc: 0.9063 - val_loss: 0.8843 - val_acc: 0.5874
Epoch 6/10
1185/1185 [==============================] - 311s - loss: 0.2230 - acc: 0.9485 - val_loss: 0.9492 - val_acc: 0.5133
Epoch 7/10
1185/1185 [==============================] - 311s - loss: 0.1555 - acc: 0.9586 - val_loss: 0.9892 - val_acc: 0.4568
Epoch 8/10
1185/1185 [==============================] - 312s - loss: 0.1106 - acc: 0.9772 - val_loss: 1.0158 - val_acc: 0.5126
Epoch 9/10
1185/1185 [==============================] - 310s - loss: 0.0565 - acc: 0.9932 - val_loss: 0.8905 - val_acc: 0.5513
Epoch 10/10
1185/1185 [==============================] - 313s - loss: 0.0568 - acc: 0.9873 - val_loss: 0.8879 - val_acc: 0.5894
I think this is not bad, since Jeremy said we should always start with over-fitting. However, I am not sure what to do next: should I start to reduce the over-fitting now? I can add regularization in there to see if it helps. But if I do that, the best accuracy will be around 0.6, is that correct?
Or should I play with the model more to increase the val_acc and deal with over-fitting later? If yes, what tricks can I try to increase val_acc? I guess data augmentation is worth trying, but is there anything more than that?
Thank you!