I have a tumor dataset - two folders (one named benign and the other, malignant). It is a imbalanced dataset (almost like 9:1 ratio). I need to do a binary classification.
When I try dataloader.train.show_batch(…), it shows images of only one class (benign, because it is majority class). Is it because, the dataloader picks images at random and hence, the majority class?
In my cnn_learner, I copied the GitHub code for ‘OverSamplingCallback’ callback into a function called (Over_Sampling_Callback) and added print statements to it. I used this function in the callback_fns of the cnn_learner, as follows:
learn = cnn_learner(dls, resnet34, metrics=error_rate,callback_fns= [partial (OverSamplingCallback)]). My print statements are not called. Could it be this function is not getting called?
In the cnn_learner, if I use metrics such as SKLearn’s roc_auc_score, it is telling me there is only one kind of value (which is benign) and I cannot use roc_auc. If point 2 above works, should not I get a balanced datase for which my error_metric should be calculated?
@karthikr
The dataloader picks up a batch of images, and displays them(upto a limit, which you can set as well). If you’re seeing only one kind of label, its because of the skewness of your data. Maybe if you run it again and again, you might see the other label as well. But the truth is that you’ll see only 1 image of the other label for every 9 images of the majority class.
Regarding your Callback, can you please attach a link to the code? I cant help you much without knowing what it is in the first place.
Finally, what do you mean by a ‘balanced’ dataset?
Maybe try adding your print command under def on_train_begin(self, **kwargs): That way your print statement should be executed at the beginning of training.
Yes, I think so too. cnn_learner returns a learner object. And Learner does not take callbacks or callback_functions as a key word arguments. So passing callbacks during fit might work!
It does, however IIRC there was something mentioned about tagging callbacks related to fit only in the call to fit, otherwise you’ll run into issues. One example is the EarlyStoppingCallback. It shouldn’t be added to Learner otherwise it will always be present