Jupyter plugin to fix mislabeled data

yeldarb · February 14, 2019, 7:21pm

In one of the videos (can’t remember if it was 2019 Deep Learning or 2018 Machine Learning), Jeremy showed a tool that one of the study groups had created to remove mislabeled data (it showed you a bunch and you clicked on the ones that were labeled wrong).

I’m working with one of my own datasets now and noticed that many of the “errors” in plot_top_losses are actually mislabeled in my dataset. I’ve been searching the web and this forum for a link to that plugin but can’t find it.

Can anyone point me in the right direction?

yeldarb · February 14, 2019, 7:33pm

Found it! Looks like it is built into fastai and was FileDeleter and is now ImageDeleter and ImageRelabeler

More info here:

Edit: And looks like they’ve since been refactored into widgets.image_cleaner

Update: after fiddling around with this for way too long, I wanted to report back to say that the step the docs take about creating a new learner using .no_split() and loading it with your saved weights is actually very important. I was getting non-sensical loss values and showing seemingly random images prior to doing that (I think because it was trying to use DatasetType.Fix internally).