Error in Chapter 2?

Hi all! I’m reading the wonderful Python for coders, and I noticed in chapter 2 that, once you want to move images using your ImageClassifierCleaner, it’s possible that you try to move an image that has the same filename in the destination folder. In that case, it’s not going to move it because that file already exists:

This is the line that appears in the book to move the images from the ImageClassifierCleaner:

for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat/)

For example, if I have 2 categories folder:

- grizzy
    - 1.jpg
    - 2.jpg
    ...
- teddy
   - 1.jpg
   - 2.jpg
   ...

If I need to move the 2.jpg of the teddy folder to the grizzy folder, it’s not gonna move it unless it has a different filename (because there is already a 2.jpg file in the directory where you want to move the image).

Am I getting this wrong? If it’s actually an error, is there any way to handle it? I was thinking in renaming the image before you move it, but not sure how to do that.

Thanks in advance!

1 Like

tbh i’m not sure whether the image names which you end up with using the bing image search in the book actually are 1.jpg, 2.jpg etc, but I’ll suggest two ways to handle it.

  1. you can use my image scraper library to create your dataset, which thanks to @butchland now has unique filenames across folders for exactly this reason.
pip install jmd_imagescraper

from jmd_imagescraper.core import *
from pathlib import Path

root = Path().cwd()/"images"

duckduckgo_search(root, "Grizzly", "grizzly bears", max_results=150)
duckduckgo_search(root, "Teddy", "teddy bears", max_results=150)
  1. if you already have a dataset you’re happy with you could simply write a move(fn, dest) function which checks ‘dest’ and automatically fixes the filename being moved if it already exists, and use that in place of shutil.move()
1 Like

Great Joe!! Thanks a lot for that. So I guess for this case instead of using Bing you’re using duck duck go, right? Is there any alternative for using Google Images?Just to see if there are also other alternatives. (I’m gonna use your library anyways).

Thanks again

Google is no longer an attractive option.

1 Like

Thank you so much for this image scraper library. It is very easy to use and much simpler than the procedure of using Bing Image Search API. This should be added to the course page!

1 Like

It’s not 100% strictly complying with ddg terms of service. No image scraper libraries for any search engine are. Fast.ai can’t really endorse anything like this.

Aha, I got it … But still, it is very handy especially for beginners like me :smiley: … thank you for making it!

1 Like

tbh i’m not sure whether the image names which you end up with using the bing image search in the book actually are 1.jpg, 2.jpg etc

That is the case Joe. They show up as 000001.jpg, 000002.jpg and so on…

rather than adding an imagescraper, do you have other ideas? why doesn’t it work off the box when using the bing search engine. The lesson 2 file (02_production.ipnb) was written for the bing engine even.

That would do it, but my first solution isn’t a drama either. It’s just downloading them from a different source.

1 Like

Yeah… I have ended up renaming it. That seems simpler indeed.

import os
for count, filename in enumerate(os.listdir(direc)):
        print(filename,count)
        src=direc+filename
        dst=direc+cont+filename
        os.rename(src,dst)