I’ve checked the statistics between the sets and they appear to be match reasonably well - there’s no consistent pattern of difference between the mins, maxes, means, and stddevs. I went ahead and tested the idea of a very basic model with just two dense layers, as follows:
# super-simple model for testing
model = Sequential([
BatchNormalization(axis=1, input_shape=(1,224,224)),
Flatten(),
Dense(512, init='he_uniform'),
PReLU(),
BatchNormalization(),
Dense(121, activation='softmax', init='he_uniform')
])
and somehow got the following (with 0.02 learning rate):
Epoch 1/1
24269/24269 [==============================] - 127s - loss: 3.4677 - acc: 0.2222 - val_loss: 3.6337 - val_acc: 0.2100
Epoch 1/1
24269/24269 [==============================] - 127s - loss: 2.9663 - acc: 0.2755 - val_loss: 2.6406 - val_acc: 0.3221
Epoch 1/1
24269/24269 [==============================] - 127s - loss: 2.7689 - acc: 0.3072 - val_loss: 2.4368 - val_acc: 0.3610
I suppose that fixes the problem I was having with validation accuracy - clearly the 3 convolution layers and 3 dense layers I tried wasn’t simple enough to avoid overfitting in this case! I’m surprised at how well this model works even without special convolution layers, though it seems to max out at 40% accuracy
I suppose my only problem now is getting above 40% with a more complex model, which is proving rather difficult - adding more layers seems to only decrease the accuracy (at least in the first epoch), for some reason.