02 production - cannot use cleaner to clean data set

I manage to train de ResNet18 as suggested. I get 2% classification error or so. I then try to use the cleaner

cleaner = ImageClassifierCleaner(learn)

and retrieve the indices of the images to be moved/deleled.
I then run

for idx in cleaner.delete(): cleaner.fns[idx].unlink()
for idx, cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)

I then rerun

bears = bears.new(
item_tfms=RandomResizedCrop(224, min_scale=0.5),
dls = bears.dataloaders(path)
learn = cnn_learner(dls, resnet18, metrics=error_rate)

And when inspecting the highest losses I find that the images I deleted/reclassified are still in the dataset. So my model does not improve. How can I properly move/delete the images in the dataset ? The book just says ‘retrain the model until and see if your accuracy improves’ but I did not succeed.

Hi @tonio,
I haven’t used this myself yet. But my guess is, that cleaner.fns[idx].unlink() removes the link to the file in your bears object. And when you run bears.new(..) you list the unlinked ones again. Try it without bears.new(..) and see if this helps.

The changed ones are a mystery, thoug. They should not show up with the wrong class, since they we’re actually moved on disk.


Hello @JackByte, thank you for your answer. I tried but it did not work either. I moved on with the course until we use the again to gain more insight. Thanks anyway.