Understanding learning rate graphs and models

First off, I’d just like to apologize if my question has already been answered in some form elsewhere in this forum, but based off my searching I couldn’t find a question in this exact format.

Working through Lesson 2 in Colab, I found that I had a significantly higher error rate than Jeremy was able to get using ResNet 34, in fact my error rate was about 10 times that of his. So in order to lower the error rate, I used ResNet 50 but saw little to no improvement.

When using fit_one_cycle, I did see improvements in the accuracy of my model, however it was nowhere near the accuracy of Jeremy’s model.

Given that my graph sees a sharp spike in loss after 1e-03 and does not have the characteristic strong downward slope, should I use the highest learning rate that doesn’t result in a major increase in loss?

Based off answers from other such similar questions, I am of the opinion that the difference in error rate is due to outlying images being present in my dataset, or simply that I had fewer images in my dataset. However, the recommended solution was to “clean out” the dataset. Is there any way to automate this especially for datasets of a larger size without manually filtering (ImageCleaner)? It seems inefficient and improbable for larger datasets.

Any and all help would be deeply appreciated. Thank you very much!

Hi there,
if you use another dataset it’s no wonder you don’t see the exact same results… You’re also right that if your dataset is smaller, this will most probably have a negative impact on your error rates.

Also you might want to look at accuracy or so instead of the plain error. A standard classification net uses the cross entropy loss and that is usually not normalized whatsoever. That means if you have 20 classes your error will usually be higher than when you have 10 classes. Just because that’s more terms in your cross entropy summation.

I would add the accuracy and maybe top_k_accuracy (depending on how many classes you have…) metrics and compare those first before assuming your model is bad because of plain error rate.