I think maybe I should rephrase what I asked. Basically, I have two questions:
-
Does the training accuracy of a model have to be higher than a certain point for overfitting to be a problem? If the training accuracy is 6% and the validation accuracy is 0.6%, am I overfitting or underfitting (or neither)? What about at 30% and 0.6%?
-
If overfitting is the problem, what can I do to avoid it beyond what I have already done?
If you need some more information about my model, I am using the kaggle datasciencebowl competition dataset (the one on plankton classification) with a 90:10 split of training to validation data. I am using the following data augmentations:
rotation_range=360,
width_shift_range=0.05,
height_shift_range=0.05,
shear_range=0.10,
zoom_range=0.3,
rescale=1./255,
horizontal_flip=True,
vertical_flip=True
And the following model (which does have some random stuff in it I don’t fully understand, added in an attempt to increase accuracy and decrease overfitting)
model = Sequential([
Convolution2D(32,3,3, border_mode='same', init='he_uniform'), # convolution layers
PReLU(),
BatchNormalization(axis=1),
Convolution2D(32,1,1, border_mode='same', init='he_uniform'), # depthwise seperate?
PReLU(),
BatchNormalization(axis=1),
Convolution2D(16,3,3, border_mode='same', init='he_uniform'),
MaxPooling2D((4,4)), # max pooling
PReLU(),
BatchNormalization(axis=1),
Dropout(0.2),
Convolution2D(64,3,3, border_mode='same', init='he_uniform'),
PReLU(),
BatchNormalization(axis=1),
Convolution2D(128,3,3, border_mode='same', init='he_uniform'),
PReLU(),
BatchNormalization(axis=1),
Convolution2D(128,1,1, border_mode='same', init='he_uniform'), # depthwise seperate?
PReLU(),
BatchNormalization(axis=1),
Convolution2D(64,3,3, border_mode='same', init='he_uniform'),
PReLU(),
BatchNormalization(axis=1),
Flatten(),
Dropout(0.5),
Dense(512, init='he_uniform'),
PReLU(),
BatchNormalization(),
Dropout(0.5),
Dense(512, init='he_uniform'),
PReLU(),
BatchNormalization(),
Dropout(0.5),
Dense(121, activation='softmax', init='he_uniform')
])
and getting the following results using a learning rate of 0.02 and a batch size of 64 (notice val_acc does not change) :
Epoch 1/1
25786/25786 [==============================] - 966s - loss: 3.5803 - acc: 0.2331 - val_loss: 15.8057 - val_acc: 0.0055
Epoch 1/1
25786/25786 [==============================] - 966s - loss: 2.7237 - acc: 0.3129 - val_loss: 16.0225 - val_acc: 0.0055
Epoch 1/1
25786/25786 [==============================] - 966s - loss: 2.5801 - acc: 0.3360 - val_loss: 15.9979 - val_acc: 0.0055
I am quite new to this and would appreciate any pointers on relevant material to read or just a general direction on how to improve these results.