Hi everyone! I created a function based on the code provided in this colab notebook that creates a dataset using DuckDuckGo instead of using Bing Image Search. You can also find the notebook in the course section for Downloading images.
@jeremy if you think this is a good idea, could we add this code to the colab notebook referenced above?
from fastbook import *
# Download images of different bear categories
def download_images_ddg(img_category, img_types, num_images):
if not path.exists():
path.mkdir()
for o in img_types:
dest = (path/o)
dest.mkdir(exist_ok=True)
results = search_images_ddg(f'{str(o)} {img_category}', max_images=num_images)
for u in range(len(results)):
try:
download_url(url=results[u],
dest=f'{dest}/{str(o)}-{str(u+1)}.jpg',
timeout=400,
show_progress=False)
except:
print(f'not found {results[u]}')
continue
We can call the function with the code below. It will download 100 images (as specified in the function argument num_images
) from each image category from DuckDuckGo.
# Define image category and image types
img_category = 'bear'
img_types = ['grizzly','black','teddy']
path = Path(img_category)
# Call function to download images
download_images_ddg(img_category, img_types, num_images=100)
The function download_images_ddg()
creates a folder named bear
in your current directory. The bear
folder contains 3 subfolders based on img_types
(grizzly, black, teddy), each containing 100 images.
The following code checks if any image failed to be downloaded or is corrupted.
# Check for failed images
fns = get_image_files(path)
failed = verify_images(fns)
failed.map(Path.unlink);
The following code deletes the created folders and their images. If you’d like to delete the created folders and their images, enter the letter y after running the code below.
# Delete directory that's not empty
import shutil
def delete_directory(user_input, img_category, img_types):
if user_input == 'y'.lower():
try:
for t in img_types:
shutil.rmtree(f'{img_category}/{t}') # delete each image type subfolder
shutil.rmtree(f'{img_category}') # delete main folder
except:
print('No directories found')
else:
pass
# Call delete_directory function to delete folders and images
user_input = str(input('Would you like to delete directories? [y/n]'))
delete_directory(user_input, img_category, img_types)
Hope this may help you create your own datasets using DuckDuckGo