Is it normal for a confusion matrix to be perfect?

asharmadev314 · May 29, 2024, 5:00am

I’m on Lesson 2 of the course and I decided to take inspiration from the bears classifier in the book and make my own model, but using different species of butterflies this time. My confusion matrix shows a perfect score, with the model making no errors. Is this possible or is there a chance I’m doing something wrong while training the model?

This is the code to my google colab:

alx42 · May 29, 2024, 3:34pm

I had a look at the code, without executing or doing any testing myself so hard to say for sure but here are a few of possible explanations and ideas to test.

The confusion matrix and error rate show a perfect score most likely due to 2 reasons; the validation set (and possibly the entire dataset) is too small, combined with the fact that the classification task is very easy. Each butterfly has very distinctive colors and patterns which make it easy for the model to classify, especially on such a small validation set. There is also the possibility of data leakage between the training and validation sets. If all the pictures or Monarchs in the dataset are very similar to each other, they would also be similar in the validation set, which could be memorized (over fit) during training.

Possible things to test:

Increase data set and validation set size
Create (manually) a separate test set that you don’t use during fine tuning and only use it once at the end of training.
Use k-fold cross-validation
Try with different butterflies, maybe use 3 species that have similar colors or patterns to see how well the model would do in that case

asharmadev314 · May 29, 2024, 5:52pm

That makes sense. I thought the same and expanded my dataset to 10 different kinds of butterflies and now the confusion matrix does result in some errors. Thanks!