Lesson 2 - Cleaning Up

Hi All,

Very new and have been working to create an image classifier that can identify Michael Cera, Jesse Eisenberg, and Andy Samberg after I saw a meme comparing them the other day haha. I was able to get to about 80% accuracy when I realized that I hadn’t pruned well enough. There were some images in the training and validation sets that may be throwing things off.

I’ve been trying to use the ImageCleaner widget to fix this problem and have reached the point where it will create a new cleaned.csv file in my directory.

When I try to generate a new ImageDataBunch I run into a problem. I copied the code I used to originally upload my txt files that I got from google images. Here is the code I used:

folder = 'cleaned'
file = 'cleaned.csv'
path = Path('data/actors')
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)
download_images(path/file, dest, max_pics=600, max_workers=0)
classes = ['samberg','cera','eisenberg']

np.random.seed(42)
cleaned_data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
     ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
cleaned_data.show_batch(rows=3, figsize=(7,8))

cleaned_data.classes, cleaned_data.c, len(cleaned_data.train_ds), len(cleaned_data.valid_ds)

When I try and check the length of the cleaned data it is the same as the original ImageDataBunch, and still included the images that I had deleted with the ImageCleaner widget. Does anyone know what I am doing wrong/ can someone please help?

If you want to use only the images from cleaned.csv, try using ImageDataBunch.from_csv() instead, and add a csv_labels= argument to specify where to look for the csv file, e.g. csv_labels='cleaned.csv'

btw I had a similar idea to yours, made a classifier to identify Margo Robbie vs Jaime Pressly. My human eyes can never tell them apart haha

2 Likes