I’ve built an image classifier that identifies among two classes of pizza: capricciosa or diavola. IPYNB available here: https://github.com/dbudhrani/capricciosa-o-diavola/blob/master/pizza-classifier.ipynb
As you can see at the bottom of the notebook (stage-5), the final error rate is 0% and the train_loss and valid_loss are 0.049157 and 0.001129. It does not seem too bad to me, although the train loss is higher than the validation loss (would that be a sign of underfitting?)
Anyway, the point of this post is that I’ve deployed this model on Render, and tested it with some images. I was expecting a great classifier that would determine the correct class with high confidence. However, that’s unfortunately not the case. Passing both images of capricciosa and diavola, the results are similar to the following:
Confidence: [(‘capricciosa’, ‘73.1%’), (‘diavola’, ‘26.9%’)]
What could be the reason of this gap in the performance between validation and inference?
Note that I’ve performed some cleaning of images in the notebook (there were all kinds of images in the downloaded set) but, as you may see in the last plot_top_losses of the notebook, there were images of diavola pizzas (and they were correctly classified with high confidence).