Drastic increase in accuracy when saved model is loaded after a long time

I trained the Densenet169 model on a dermatology dataset till it reached an error_rate of 0.43. I did this 3 months ago.

Recently I made the data bunch again to test the accuracy and the error rate had decreased to 0.17 and the training loss had increased.

This drastic decrease in error_rate (ie increase in accuracy) was very unexpected.

I reckoned that this could have been because of creating the data bunch again. The images in the training set and the validation set may not the same anymore. Some images that were in the validation set earlier may now be in the training set and some images that were in the training set earlier may now be in the validation set.

But, The random.seed() values in both cases were the same. This made me question the above explanation.

Can anyone comment on what might have happened here, please?

Did you used the exactly same training and valid splits from the previous training? If your splits are different you will have greater accuracy, especially if your model were overfitting, because now you have previous cases the model “knows” in your test set.

1 Like

yes, I used a valid_pct of 0.3 both times.

But if you dont specify the random seed it could be different 0.7 and 0.3 splits each time. You should fix the random seed to be sure the splits are the same.

There’s a few more seeds that need to be set to get “true” reproducibility. See here:

https://docs.fast.ai/dev/test.html#getting-reproducible-results

1 Like