Hi All,
Very new and have been working to create an image classifier that can identify Michael Cera, Jesse Eisenberg, and Andy Samberg after I saw a meme comparing them the other day haha. I was able to get to about 80% accuracy when I realized that I hadn’t pruned well enough. There were some images in the training and validation sets that may be throwing things off.
I’ve been trying to use the ImageCleaner widget to fix this problem and have reached the point where it will create a new cleaned.csv file in my directory.
When I try to generate a new ImageDataBunch I run into a problem. I copied the code I used to originally upload my txt files that I got from google images. Here is the code I used:
folder = 'cleaned'
file = 'cleaned.csv'
path = Path('data/actors')
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)
download_images(path/file, dest, max_pics=600, max_workers=0)
classes = ['samberg','cera','eisenberg']
np.random.seed(42)
cleaned_data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
cleaned_data.show_batch(rows=3, figsize=(7,8))
cleaned_data.classes, cleaned_data.c, len(cleaned_data.train_ds), len(cleaned_data.valid_ds)
When I try and check the length of the cleaned data it is the same as the original ImageDataBunch, and still included the images that I had deleted with the ImageCleaner widget. Does anyone know what I am doing wrong/ can someone please help?