Drastic increase in accuracy when saved model is loaded after a long time

Madhurshalini · January 14, 2020, 5:27pm

I trained the Densenet169 model on a dermatology dataset till it reached an error_rate of 0.43. I did this 3 months ago.

Recently I made the data bunch again to test the accuracy and the error rate had decreased to 0.17 and the training loss had increased.

This drastic decrease in error_rate (ie increase in accuracy) was very unexpected.

I reckoned that this could have been because of creating the data bunch again. The images in the training set and the validation set may not the same anymore. Some images that were in the validation set earlier may now be in the training set and some images that were in the training set earlier may now be in the validation set.

But, The random.seed() values in both cases were the same. This made me question the above explanation.

Can anyone comment on what might have happened here, please?

fmobrj75 · January 14, 2020, 11:09pm

Did you used the exactly same training and valid splits from the previous training? If your splits are different you will have greater accuracy, especially if your model were overfitting, because now you have previous cases the model “knows” in your test set.

Madhurshalini · January 20, 2020, 5:19pm

yes, I used a valid_pct of 0.3 both times.

fmobrj75 · January 20, 2020, 6:45pm

But if you dont specify the random seed it could be different 0.7 and 0.3 splits each time. You should fix the random seed to be sure the splits are the same.

muellerzr · January 20, 2020, 6:49pm

There’s a few more seeds that need to be set to get “true” reproducibility. See here:

https://docs.fast.ai/dev/test.html#getting-reproducible-results