I am redoing some experiments with the cats & dogs (redux) data, and I’ve been observing something a bit weird, which is that my validation loss is often lower than my training loss (and correspondingly, the validation accuracy is higher than the training accuracy). For example, here is a training run with five steps for each learning rate, and validation is ahead of training every step of the way:
[details=Click to see the output] >>>>> regime at lr=0.009999999776482582
Epoch 1/5
718/718 [==============================] - 167s - loss: 0.5972 - acc: 0.9525 - val_loss: 0.3264 - val_acc: 0.9743
Epoch 2/5
718/718 [==============================] - 167s - loss: 0.5927 - acc: 0.9583 - val_loss: 0.3138 - val_acc: 0.9787
Epoch 3/5
718/718 [==============================] - 172s - loss: 0.5325 - acc: 0.9636 - val_loss: 0.3742 - val_acc: 0.9731
Epoch 4/5
718/718 [==============================] - 177s - loss: 0.5416 - acc: 0.9637 - val_loss: 0.3526 - val_acc: 0.9766
Epoch 5/5
718/718 [==============================] - 177s - loss: 0.5292 - acc: 0.9651 - val_loss: 0.3817 - val_acc: 0.9746
>>>>> regime at lr=0.0010000000474974513
Epoch 1/5
718/718 [==============================] - 177s - loss: 0.5069 - acc: 0.9662 - val_loss: 0.2572 - val_acc: 0.9822
Epoch 2/5
718/718 [==============================] - 178s - loss: 0.4951 - acc: 0.9675 - val_loss: 0.3179 - val_acc: 0.9776
Epoch 3/5
718/718 [==============================] - 178s - loss: 0.4664 - acc: 0.9687 - val_loss: 0.3260 - val_acc: 0.9773
Epoch 4/5
718/718 [==============================] - 178s - loss: 0.4775 - acc: 0.9685 - val_loss: 0.3465 - val_acc: 0.9771
Epoch 5/5
718/718 [==============================] - 175s - loss: 0.4629 - acc: 0.9691 - val_loss: 0.3090 - val_acc: 0.9787
>>>>> regime at lr=9.999999747378752e-05
Epoch 1/5
718/718 [==============================] - 175s - loss: 0.4386 - acc: 0.9706 - val_loss: 0.3539 - val_acc: 0.9766
Epoch 2/5
718/718 [==============================] - 175s - loss: 0.4599 - acc: 0.9695 - val_loss: 0.2984 - val_acc: 0.9807
Epoch 3/5
718/718 [==============================] - 174s - loss: 0.4480 - acc: 0.9701 - val_loss: 0.3146 - val_acc: 0.9787
Epoch 4/5
718/718 [==============================] - 171s - loss: 0.4522 - acc: 0.9697 - val_loss: 0.3461 - val_acc: 0.9776
Epoch 5/5
718/718 [==============================] - 175s - loss: 0.4581 - acc: 0.9694 - val_loss: 0.3307 - val_acc: 0.9783
>>>>> regime at lr=9.999999747378752e-06
Epoch 1/5
718/718 [==============================] - 175s - loss: 0.4445 - acc: 0.9706 - val_loss: 0.3168 - val_acc: 0.9792
Epoch 2/5
718/718 [==============================] - 174s - loss: 0.4496 - acc: 0.9696 - val_loss: 0.3562 - val_acc: 0.9766
Epoch 3/5
718/718 [==============================] - 165s - loss: 0.4329 - acc: 0.9710 - val_loss: 0.3510 - val_acc: 0.9766
Epoch 4/5
718/718 [==============================] - 165s - loss: 0.4505 - acc: 0.9700 - val_loss: 0.3160 - val_acc: 0.9792
Epoch 5/5
718/718 [==============================] - 165s - loss: 0.4451 - acc: 0.9703 - val_loss: 0.2921 - val_acc: 0.9807
>>>>> regime at lr=9.999999974752427e-07
Epoch 1/5
718/718 [==============================] - 165s - loss: 0.4405 - acc: 0.9699 - val_loss: 0.3076 - val_acc: 0.9792
Epoch 2/5
718/718 [==============================] - 165s - loss: 0.4344 - acc: 0.9706 - val_loss: 0.3304 - val_acc: 0.9778
Epoch 3/5
718/718 [==============================] - 165s - loss: 0.4531 - acc: 0.9693 - val_loss: 0.3190 - val_acc: 0.9787
Epoch 4/5
718/718 [==============================] - 165s - loss: 0.4383 - acc: 0.9706 - val_loss: 0.3365 - val_acc: 0.9776
Epoch 5/5
718/718 [==============================] - 165s - loss: 0.4646 - acc: 0.9690 - val_loss: 0.3964 - val_acc: 0.9736[/details]
How should I interpret this? Does it just mean that when (randomly) selecting the validation set, it happened to get a selection of ‘easy’ examples (or easier than average)?