Error in Chapter 2?

santiagopaz · October 12, 2020, 3:34pm

Hi all! I’m reading the wonderful Python for coders, and I noticed in chapter 2 that, once you want to move images using your ImageClassifierCleaner, it’s possible that you try to move an image that has the same filename in the destination folder. In that case, it’s not going to move it because that file already exists:

This is the line that appears in the book to move the images from the ImageClassifierCleaner:

for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat/)

For example, if I have 2 categories folder:

- grizzy
    - 1.jpg
    - 2.jpg
    ...
- teddy
   - 1.jpg
   - 2.jpg
   ...

If I need to move the 2.jpg of the teddy folder to the grizzy folder, it’s not gonna move it unless it has a different filename (because there is already a 2.jpg file in the directory where you want to move the image).

Am I getting this wrong? If it’s actually an error, is there any way to handle it? I was thinking in renaming the image before you move it, but not sure how to do that.

Thanks in advance!

joedockrill · October 12, 2020, 7:56pm

tbh i’m not sure whether the image names which you end up with using the bing image search in the book actually are 1.jpg, 2.jpg etc, but I’ll suggest two ways to handle it.

you can use my image scraper library to create your dataset, which thanks to @butchland now has unique filenames across folders for exactly this reason.

pip install jmd_imagescraper

from jmd_imagescraper.core import *
from pathlib import Path

root = Path().cwd()/"images"

duckduckgo_search(root, "Grizzly", "grizzly bears", max_results=150)
duckduckgo_search(root, "Teddy", "teddy bears", max_results=150)

if you already have a dataset you’re happy with you could simply write a move(fn, dest) function which checks ‘dest’ and automatically fixes the filename being moved if it already exists, and use that in place of shutil.move()

santiagopaz · October 12, 2020, 10:36pm

Great Joe!! Thanks a lot for that. So I guess for this case instead of using Bing you’re using duck duck go, right? Is there any alternative for using Google Images?Just to see if there are also other alternatives. (I’m gonna use your library anyways).

Thanks again

joedockrill · October 13, 2020, 5:04am

Google is no longer an attractive option.

ahmed3yad · October 14, 2020, 7:56pm

Thank you so much for this image scraper library. It is very easy to use and much simpler than the procedure of using Bing Image Search API. This should be added to the course page!

joedockrill · October 14, 2020, 8:17pm

It’s not 100% strictly complying with ddg terms of service. No image scraper libraries for any search engine are. Fast.ai can’t really endorse anything like this.

ahmed3yad · October 14, 2020, 8:36pm

Aha, I got it … But still, it is very handy especially for beginners like me … thank you for making it!

thetj09 · October 16, 2020, 6:05pm

tbh i’m not sure whether the image names which you end up with using the bing image search in the book actually are 1.jpg, 2.jpg etc

That is the case Joe. They show up as 000001.jpg, 000002.jpg and so on…

rather than adding an imagescraper, do you have other ideas? why doesn’t it work off the box when using the bing search engine. The lesson 2 file (02_production.ipnb) was written for the bing engine even.

joedockrill · October 16, 2020, 6:57pm

That would do it, but my first solution isn’t a drama either. It’s just downloading them from a different source.

thetj09 · October 16, 2020, 9:03pm

Yeah… I have ended up renaming it. That seems simpler indeed.

import os
for count, filename in enumerate(os.listdir(direc)):
        print(filename,count)
        src=direc+filename
        dst=direc+cont+filename
        os.rename(src,dst)

dilovan · March 3, 2021, 3:06pm

Here is the script I used to rename the files, and circumvent the issue. I run it right after the files have been loaded

import os
for count, dir in enumerate(os.listdir(path)):
        if dir == ".ipynb_checkpoints":
          continue
        sub_path = os.path.join(path, dir)
        for count, _file in enumerate(os.listdir(sub_path)):
          old_file = os.path.join(sub_path, _file)
          new_file = os.path.join(sub_path, dir + _file)
          os.rename(old_file, new_file)

Kerner · September 2, 2021, 4:31am

Hi,
You can also change the file name before moving it to the new folder.

Here I just added the current folder name to the file number so you can replace this code:

for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat/)

with this code:

for idx, cat in cleaner.change():
f = cleaner.fns[idx]
os.rename(f, f.parent/(f.parent.name+f.stem+f.suffix))
shutil.move(str(f.parent/(f.parent.name+f.stem+f.suffix)), path/cat )

And you can continue to clean your dataset without any additional errors