Performance Metrics: How to improve precision?

mgloria · June 5, 2019, 11:54pm

I am building an image classifier with an imbalanced dataset (98% vs 2%). However I have more than 10.000 images for the minority class so I thought about making it balanced (50/50) by undersampling the majority class and then trained my model (architecture is resnet50). Note: My validation set has the real proportions (i.e. original unbalance).

My results are the following: accuracy and recall are both over 90% but precision is only 20%! Maybe I am just confused but should not accuracy and precision be very related for an imbalance dataset problem? Am I doing something wrong in the procedure?

What tricks can I used to improve precision?

maral · June 6, 2019, 12:56am

Precision tells us how well the model predicts rare events (minority class). To improve precision you need to address the class imbalance. You can read about the problem for structured data sets by reviewing the imblearn docs.

https://imbalanced-learn.readthedocs.io/en/stable/index.html

For image classification I would randomly over-sample the minority class during each batch.

mgloria · June 6, 2019, 7:03am

Thanks for the link. Great resource! @maral is there a specific way in fastai to apply data augmentation techniques to only one of the classes (i.e. to the minority class)? This would for sure speed up the training.

maral · June 9, 2019, 11:53pm

The easiest way to achieve this is to create file copies of the images in the minority class.