ImageClassifierCleaner - Cleaning multiple folders

Hi all,
Currently using the ImageClassifierCleaner method to clean up the dataset for the bear classifier with regards to chapter 2 of the book. However, I noticed a issue where the method wont save the states when I jump into a new folder to clean.

For example:

The ss shows me selecting pictures that are to be removed that dont represent teddy bear in the Train set.

Next I move on to cleaning the next folders i.e. grizzly in train set as shown below.

Now, when I come back again to my earlier folder i.e. Teddy Bear in train set all the states are gone. That is those pictures I chose to be deleted are gone.

How do I go about clearing data from multiple folders ?

2 Likes

I’ve been trying to figure out the same thing, did you get anywhere with this problem?

@unaveenj @jfww I was just wondering about this as well today: it does not look like you can clean multiple folders at once, you would have to do it in a cascaded way (multiple times).

I went back and re-read through the book notebook and indeed it calls ImageClassifierCleaner one time per folder. This is because the ImageClassifierCleaner does not actually delete the pictures, it just returns their index, but if you do it multiple times and then you run cleaner.delete() and/or cleaner.change() all changes are taken care of.

To demonstrate this I have run ImageClassifierCleaner twice. The first time I moved a grizzly bear image from my black bear folder to the grizzly bear folder, the second time I deleted a picture of a seal from the grizzly bear folder. I then check the number of images with:

fns = get_image_files(path)
len(fns)

and you see the image count has gone down by one:

An alternative is the imagecleaner from Joe Dockrill’s jmd_imagescraper, which deletes the images directly.

  • docs in here
  • blog post in here
    Here’s an example:
    Screen-Recording-2021-09-17-at-7

It looks like I was not correct in my first reply: the proper way is to run the cleaning code multiple times (twice, in my example above), for each time you switch folder withthe interactive cleaner, but without running ImageClassifierCleaner multiple times.
There is a detailed explanation on how to do this in the blog post Classifying Images of Alcoholic Beverages with fast.ai v2. See below:

I would recommend working through that blog post. The other benefit is that it shows also how to re-train the model after deletion of the images.

A further extensive example is shown in the article Image classification from scratch to deployment; the difference is that here the cleaning process demonstrated allows to delete directly the images that have the highest loss during the initial training; additionally, potential duplicates re also addressed. Again, how the model is re-rained after the cleaning, is shown in detail.

@unaveenj @jfww
In the notebook 02a_production_jmd_image_scraper_train_and_inference.ipynb in this repo I used jmd_imagescraper to get images, then jmd_imagescraper.imagecleaner to do a first clean-up (before modeling), and then a model-driven clean-up using fastai.widgets.image_cleaner. It all works well, I hope it is helpful.