When does class imbalance become an issue in binary image classification?
At the moment I’m trying to classify two groups of skin lesions. The dataset contains about 2300 images, 90% belongs to class A and the remainder to class B. Even for a trained dermatologist it can be hard to distinguish between A (benign) and B (malign).
I’m using resnet 50, image size 299, batch size 4, wd = 0.01. After 7 epochs on the final layer and 3 more using unfreeze(), there is an error rate of 0.09. Which is about the prior probability.
About 60% of class B get misclassified as false negative (‘benign skin lesions’). The problem is that the CNN predicts almost everything to be in class A. So the specificity (number of predicted ‘benigns’ that are truly ‘benign’) is still very good.
How can I improve on this?
Punish certain type of mistake more heavy, for example by changing the threshold for saying either A or B?
Train only on the misclassified images (how???)
Oversampling?
PM: the images are from the HAM10000 dataset. Very interesting, it contains 7 classes in total. Currently I focus on two classes: nevi and melanoma.
Accuracy is not the best metric when you have class imbalance. If you predict everything is benign your accuracy is 90%. Similar problems have appeared in kaggle competitions in the recent future. Some train using f1 loss. See this excellent kernel: https://www.kaggle.com/iafoss/pretrained-resnet34-with-rgby-0-460-public-lb
Thank you. I was already trying to implement the f1_score and will examine the Kaggle kernel.
Still, the metrics don’t influence the training process and I’m also interested in approaches that change the training process like oversampling and training on misclassified images.
In this specific case the ‘cost’ of missing a malign lesion should be much higher.
Today I tried to implement the F1 metric. The Kaggle kernel your refered to uses a callback F1_callback(Callback). This class refers to the F1 metric when initialized. I just copied the whole method and tried to apply it in my FastAI 2019 framework.
Unfortunately it throws a bunch of errors . Any tips maybe on how to implement this F1 metric.
When literally copying the example I receive the following error:
Hello Sinsji:
The kernel I shared is for fastai v0.7. Are you working on v1.0?
Also, can you share more of the code? it would be hard to follow what you are trying to do without it.
Best,
Maria
Oversampling has not worked that well for me. I tried to do this in the protein competition and I may not know how to do it correctly but I just quite don’t trust it
As far as I know in medicine the type of mistake is sometimes more relevant than the overall accuracy. In these cases the goal is more specific like the cost of missing a certain disease or running an unnecessary expensive test. So in this case I would like to punish the learning process for missing the severe disease.