Why is my neural network only predicting 2 out of 4 classes in the confusion matrix?

architect · January 2, 2020, 1:19am

The code for my kernel, written on Kaggle can be found here: https://github.com/LucidLefo/retinopathy-fastai-basic/blob/master/diabetic-retinopath-fastai.ipynb

This is part of the diabetic retinopathy competition on Kaggle, where I wanted to use what I learned from the resources here to do something with the fastai library. For some reason, even though the classes go from 0-4, as per the data, it seems to only be using 0 and 2. It shouldn’t be an issue with the accuracy, as to me it was satisfactory, ending up with ~75% accuracy.

So, what am I doing wrong here?

Thank you so much!

bwarner · January 2, 2020, 5:50am

I’m guessing its because your model isn’t learning from the retinas themselves but rather image metadata such as image size and pixel counts. This is a pitfall which was highlighted during the competition by Tom Aindow.

You can look in at other high scoring public notebooks on kaggle, or peruse the top solution write-ups in the discussion for this competition to see how others dealt with this and other pitfalls. The competition organizers had a number of tricks up their sleeves to (hopefully) insure the winning solutions generalized well, and it appears you stumbled into one of them.

I noticed on your notebook that you have a link to the first lesson. If you are just starting the fastai course, you might want to practice with a less challenging dataset and then come back to this one.