I’m working on a project where I have to identify anomalies in images. At first, I thought a simple 2 class classifier would work, but the problem is that I have over 100,000 “normal” images, and less than 50 anomalies. I know that there will be other kinds of anomalies in the wild not captured in the current dataset, and I’m talking about medical images here. I have tried one class SVM using the histograms, but it failed miserably. What would be another approach for this problem?
From what I heard from fastai videos, deep learning ideally should handle imbalanced data, that being said.
Assuming that 100k images are representative of your real world data set, you can try the autoencoder approach used to detect data drift, to identify the anomalies.
Autoencoder is what worked best, thanks for suggesting it!
Glad that it worked. Also have a look at the alibi detect package from seldon. It’s got some nice features for outlier detection, drift detection and so on.