Chapter 2: Using DuckDuckGo to build your dataset

Hi everyone! I created a function based on the code provided in this colab notebook that creates a dataset using DuckDuckGo instead of using Bing Image Search. You can also find the notebook in the course section for Downloading images.

@jeremy if you think this is a good idea, could we add this code to the colab notebook referenced above?

from fastbook import *

# Download images of different bear categories
def download_images_ddg(img_category, img_types, num_images):
    if not path.exists():
        path.mkdir()
        for o in img_types:
            dest = (path/o)
            dest.mkdir(exist_ok=True)
            results = search_images_ddg(f'{str(o)} {img_category}', max_images=num_images)
            for u in range(len(results)):
                try:
                    download_url(url=results[u], 
                                 dest=f'{dest}/{str(o)}-{str(u+1)}.jpg', 
                                 timeout=400, 
                                 show_progress=False)
                except:
                    print(f'not found {results[u]}')
                    continue

We can call the function with the code below. It will download 100 images (as specified in the function argument num_images) from each image category from DuckDuckGo.

# Define image category and image types
img_category = 'bear'
img_types = ['grizzly','black','teddy']
path = Path(img_category)

# Call function to download images
download_images_ddg(img_category, img_types, num_images=100)

The function download_images_ddg() creates a folder named bear in your current directory. The bear folder contains 3 subfolders based on img_types (grizzly, black, teddy), each containing 100 images.

The following code checks if any image failed to be downloaded or is corrupted.

# Check for failed images
fns = get_image_files(path)
failed = verify_images(fns)
failed.map(Path.unlink);

The following code deletes the created folders and their images. If you’d like to delete the created folders and their images, enter the letter y after running the code below.

# Delete directory that's not empty
import shutil

def delete_directory(user_input, img_category, img_types):
    if user_input == 'y'.lower():
        try:
            for t in img_types:
                shutil.rmtree(f'{img_category}/{t}') # delete each image type subfolder
            shutil.rmtree(f'{img_category}') # delete main folder
        except:
            print('No directories found')
    else:
        pass

# Call delete_directory function to delete folders and images
user_input = str(input('Would you like to delete directories? [y/n]'))
delete_directory(user_input, img_category, img_types)

Hope this may help you create your own datasets using DuckDuckGo :slight_smile:

4 Likes

The code is already in fastbook :slight_smile:

link

1 Like

Thanks, @jeremy!

Maybe I should rephrase what I did just to be clear. I’m using the function search_images_ddg() provided by fastbook. The function I developed helps create the folder and subfolders and populates them with images using the function search_images_ddg(). Something similar was done in the Bing Image Search section of the chapter 2 notebook, and I wanted to do the same for downloading images using DuckDuckGo.

Thank you again for your response, I really appreciate it :smile:

2 Likes