I was trying to train the other dense layers and something strange is happening.
When I ran the first epoch to tune the other dense layers (after tuning the last layer) the accuracy of the training set improved very slightly. When I ran more epochs the accuracy of the TRAINING set (not the validation set) started to decrease.
I thought that the cost function would never reach a local minimum, according to Jeremy’s classes, so how could it increase? Still, I moved back one step (reloaded the weights before the accuracy decreased), tried to reduce the step-size from 0.01 to 0.003, in case the gradient was diverging, but the accuracy also decreased with this lower step-size.
Is this possible to happen? If it was the validation set accuracy decreasing, it could simply be overfit, but this seems very strange to me. I tried to use Jeremy’s code for this, but it’s also possible that I made a mistake somewhere in the code…
Just to confirm, I’ve always used the same training set, both when I trained the last layer and now, trying to tune the previous layers.
Any thoughts on this?
Since you are using a pre-trained network make sure that you have set shuffle to false for your training batches also, not just for validation batches.
You have to use the same ordering of images when using a pre-trained network.
You mean, because I trained the last layer alone and then the other layers in a different step? That could be the reason, I’m not sure if I shuffled the batches. I’ll check it and post the results here. But I don’t understand why that is a problem.
Since I’m going through the whole training set in each epoch, what’s the difference if in a couple epochs (when training only the last layer) I split the training set in batches in one way, and in the next epochs (with all layers) I split the same training set in different batches?
Or are you saying that some images which were in the validation set may have moved to the training set? In that case I guess it makes sense, since that changes the cost function of the training set… Although the accuracy lowered progressively after epoch 2, 3… instead of only lowering in the first epoch. I suppose after lowering once, it would gradually recover from that point, no?
@agois Learning rate may be an issue in this case. Try to use Adagrad or RMSprop optimizers.
You actually can (and usually should) shuffle when training, even with a pretrained network.
@agois No, sorry, I got confused.
I was talking about pre-calculated convolutional layers output and somehow understood you were using those.
If you are not pre-calculating convolutional layers output you should shuffle your training batches.
Maybe try a lower learning late, as @KhanSuleyman suggested.
Which Kaggle competition this is?
Thanks for your tips guys! It made me look further into the keras documentation to understand batches better, that was quite helpful. I believe I’m already using RMSprop, but I’ll try Adagrad or a lower learning rate.
And @torkku, for now I’m just trying the cats & dogs redux. Have you entered other competitions already?