Hello! I have an imblanced dataset and am trying to use a WeightedDL based on some previously asked questions on the forum.
Here’s my code block
items_path = "labeled-nn-unzipped"
items = [label_func(x) for x in get_image_files(items_path)]
wgts = [1/items.count(x) for x in items]
db = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=TrainTestSplitter(random_state=42),
get_y=label_func)
dls = db.dataloaders(items_path, dl_type=WeightedDL, wgts=wgts)
learn = vision_learner(dls, resnet34, metrics=error_rate)
I’m pretty sure I’m missing something really silly but whenever I run it the distribution of my learn.dls.train items is alway the same.
Okay using the weighted_dataloaders function appears to be what I was missing
items_path = "labeled-nn-unzipped"
items = [label_func(x) for x in get_image_files(items_path)]
wgts = [1/items.count(x) for x in items]
db = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=TrainTestSplitter(random_state=42),
get_y=label_func)
dls = db.weighted_dataloaders(items_path, wgts=wgts)
dls.show_batch(max_n=20, figsize=(20,5), ncols=10)