WeightedDL + wgts not working for an imbalanced dataset?

Hello! I have an imblanced dataset and am trying to use a WeightedDL based on some previously asked questions on the forum.

Here’s my code block

items_path = "labeled-nn-unzipped"

items = [label_func(x) for x in get_image_files(items_path)]

wgts = [1/items.count(x) for x in items]

db = DataBlock(blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=TrainTestSplitter(random_state=42),
    get_y=label_func)

dls = db.dataloaders(items_path, dl_type=WeightedDL, wgts=wgts)

learn = vision_learner(dls, resnet34, metrics=error_rate)

I’m pretty sure I’m missing something really silly but whenever I run it the distribution of my learn.dls.train items is alway the same.

Thanks in advance!

To me it seems you are assigning the same weight to all images.

Instead, consider assigning a higher weight to the minority class(es) items

1 Like

Hi Jürgen, thanks for replying – I’ve printed out items and weights and there’s definitely different values:

Here’s the distribution of my dls.train.items after running the above:

download (2)

Okay using the weighted_dataloaders function appears to be what I was missing :slight_smile:

items_path = "labeled-nn-unzipped"

items = [label_func(x) for x in get_image_files(items_path)]

wgts = [1/items.count(x) for x in items]

db = DataBlock(blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files,
    splitter=TrainTestSplitter(random_state=42),
    get_y=label_func)

dls = db.weighted_dataloaders(items_path, wgts=wgts)

dls.show_batch(max_n=20, figsize=(20,5), ncols=10)