I’m starting this thread for Part 1 (2020) to discuss approaches for creating image datasets for vision learning. (There are other threads for earlier classes.)
I’ll start with https://github.com/prairie-guy/ai_utilities, a github repository I wrote containing several python functions useful for fastai.
image_download() uses your choice of search engines to download a specified number of images. The default search engine is bing. Each search engine has set its maximum limit for the number of downloads. (I’m working on increasing this by using multiple date ranges.)
Flickr requires an apikey, but one is easy to obtain for non-commercial use. With it, the download limit is larger than with the other search engines.
Searching with google is not working due to a bug in the upstream package: icrawler (I have a fix and have issued a pull request. If anyone wants the fix, let me know).
Installation
git clone git@github.com:prairie-guy/ai_utilities.git 
pip install  icrawler
pip install python-magic 
Within your python code, you will need to include the following code to access ai_utilities (Sorry, no easy install with pip) You will need to indicate the parent-directory, something like /home/prairieguy/
import sys                                                                                                                                            
sys.path.append('your-parent-directory-of-ai_utilities')                                                                                              
from ai_utilities import *                                                                                                                            
from pathlib import Path                                                                                                                              
from fastai.vision.all import * 
Usage
Here is sample python code which does the following: Downloads up to 100 images of each of the animals, check that each image-file is a valid jpeg-file, remove duplicates, save to the directory dataset and create data =                  ImageDataBunch.from_folder(...). Optionally, create an imagenet-type directory structure.
import sys                                                                                                                                            
sys.path.append('your-parent-directory-of-ai_utilities')                                                                                              
from ai_utilities import *                                                                                                                            
from pathlib import Path                                                                                                                              
from fastai.vision.all import *                                                                                                                       
                                                                                                                                                      
for p in ['dog', 'goat', 'sheep']:                                                                                                                    
    image_download(p, 100)                                                                                                                            
path = Path.cwd()/'dataset'                                                                                                                           
data = ImageDataLoaders.from_folder(path,valid_pct=0.2, item_tfms=Resize(224)) 
# Optionally, create an imagenet-type file directory.                                                                                                 
make_train_valid(path)                                                                                                                                
data = ImageDataLoaders.from_folder(path, train='train', valid='valid', item_tfms=Resize(224))
