Hey, fastai users, what techniques are you currently using to balance multi-label data? I’ve currently been doing manual majority under-sampling before instantiating my ImageDataBunch, but I’m curious if there is a more automated or preferable way.
Does there happen to exist fastai utilities to do any of the following?
- minority over-sampling
- majority under-sampling
- multilabel class weighting in loss function
Any suggestions are greatly appreciated!
Recently an oversampling callback was done by @ilovescience, Oversampling Callback
@muellerzr Thanks. I just tried this and it looks like it doesn’t currently support multi-label data though:
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai/callbacks/oversampling.py in __init__(self, learn, weights)
15 _, counts = np.unique(self.labels,return_counts=True)
16 self.weights = (weights if weights is not None else
---> 17 torch.DoubleTensor((1/counts)[self.labels]))
18 self.label_counts = np.bincount([self.learn.data.train_dl.dataset.y[i].data for i in range(len(self.learn.data.train_dl.dataset))])
19 self.total_len_oversample = int(self.learn.data.c*np.max(self.label_counts))
IndexError: arrays used as indices must be of integer (or boolean) type
But good to know about this callback for non-multi-label problems!