Hi all! I’m reading the wonderful Python for coders, and I noticed in chapter 2 that, once you want to move images using your ImageClassifierCleaner, it’s possible that you try to move an image that has the same filename in the destination folder. In that case, it’s not going to move it because that file already exists:
This is the line that appears in the book to move the images from the ImageClassifierCleaner:
for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat/)
If I need to move the 2.jpg of the teddy folder to the grizzy folder, it’s not gonna move it unless it has a different filename (because there is already a 2.jpg file in the directory where you want to move the image).
Am I getting this wrong? If it’s actually an error, is there any way to handle it? I was thinking in renaming the image before you move it, but not sure how to do that.
tbh i’m not sure whether the image names which you end up with using the bing image search in the book actually are 1.jpg, 2.jpg etc, but I’ll suggest two ways to handle it.
you can use my image scraper library to create your dataset, which thanks to @butchland now has unique filenames across folders for exactly this reason.
if you already have a dataset you’re happy with you could simply write a move(fn, dest) function which checks ‘dest’ and automatically fixes the filename being moved if it already exists, and use that in place of shutil.move()
Great Joe!! Thanks a lot for that. So I guess for this case instead of using Bing you’re using duck duck go, right? Is there any alternative for using Google Images?Just to see if there are also other alternatives. (I’m gonna use your library anyways).
Thank you so much for this image scraper library. It is very easy to use and much simpler than the procedure of using Bing Image Search API. This should be added to the course page!
It’s not 100% strictly complying with ddg terms of service. No image scraper libraries for any search engine are. Fast.ai can’t really endorse anything like this.
tbh i’m not sure whether the image names which you end up with using the bing image search in the book actually are 1.jpg, 2.jpg etc
That is the case Joe. They show up as 000001.jpg, 000002.jpg and so on…
rather than adding an imagescraper, do you have other ideas? why doesn’t it work off the box when using the bing search engine. The lesson 2 file (02_production.ipnb) was written for the bing engine even.
Here is the script I used to rename the files, and circumvent the issue. I run it right after the files have been loaded
import os
for count, dir in enumerate(os.listdir(path)):
if dir == ".ipynb_checkpoints":
continue
sub_path = os.path.join(path, dir)
for count, _file in enumerate(os.listdir(sub_path)):
old_file = os.path.join(sub_path, _file)
new_file = os.path.join(sub_path, dir + _file)
os.rename(old_file, new_file)
Hi,
You can also change the file name before moving it to the new folder.
Here I just added the current folder name to the file number so you can replace this code:
for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat/)
with this code:
for idx, cat in cleaner.change():
f = cleaner.fns[idx]
os.rename(f, f.parent/(f.parent.name+f.stem+f.suffix))
shutil.move(str(f.parent/(f.parent.name+f.stem+f.suffix)), path/cat )
And you can continue to clean your dataset without any additional errors