Just to help people which is trying to get somekind of OverSampler, here I post the “hack” I had to do to implement it This is a first iteration, so this could have some improvements to be more robust, so feel free to suggest improvements 
In order to set you in context, here the splitter
is somehow splitting by index depending on the column Dataset
. Besides, mention that the label_df
DataFrame is sorted to have first the rows regarding training, this is why I use [:len(train_df)]
on the weights.
label_db = DataBlock(
blocks=(ImageBlock(cls=PILImageBW), MultiCategoryBlock),
get_x=ColReader('Original_Filename', pref=raw_preprocess_folder+'/', suff='.png'),
get_y=ColReader('Target'),
splitter=TestColSplitter(col='Dataset'),
item_tfms=item_tfms,
batch_tfms=label_transform,
)
label_dl = label_db.dataloaders(label_df, bs=BATCH_SIZE, num_workers=0, shuffle_train=True, drop_last=True)
# Calculate sample weights to balance the DataLoader
from collections import Counter
count = Counter(label_dl.items['Target'])
class_weights = {}
for c in count:
class_weights[c] = 1/count[c]
wgts = label_dl.items['Target'].map(class_weights).values[:len(train_df)]
weighted_dl = label_db.dataloaders(label_df, bs=BATCH_SIZE, num_workers=0, shuffle_train=True, drop_last=True, dl_type=WeightedDL, wgts=wgts)
label_dl.train = weighted_dl.train