When does class imbalance become an issue in binary image classification?
At the moment I’m trying to classify two groups of skin lesions. The dataset contains about 2300 images, 90% belongs to class A and the remainder to class B. Even for a trained dermatologist it can be hard to distinguish between A (benign) and B (malign).
I’m using resnet 50, image size 299, batch size 4, wd = 0.01. After 7 epochs on the final layer and 3 more using
unfreeze(), there is an error rate of 0.09. Which is about the prior probability.
About 60% of class B get misclassified as false negative (‘benign skin lesions’). The problem is that the CNN predicts almost everything to be in class A. So the specificity (number of predicted ‘benigns’ that are truly ‘benign’) is still very good.
How can I improve on this?
Punish certain type of mistake more heavy, for example by changing the threshold for saying either A or B?
Train only on the misclassified images (how???)
PM: the images are from the HAM10000 dataset. Very interesting, it contains 7 classes in total. Currently I focus on two classes: nevi and melanoma.
For more information: there is an article on a CNN that uses this dataset among others. It was developed by a team from Stanford University.. They are ‘on par’ with a team of about 30 dermatologists. Their dataset is containts about 120.000 images in total(!).