Download Google Images for ImageDataBunch.from_folder

I wrote this python module [ai_utilities] to simplify downloading images to be used by ImageDataBunch.from_folder(. . .)


A set of scripts useful with lectures and libraries. The most common use case is downloading images for training vision models.


  • Anaconda
  • conda install selenium
  • for current Geckodriver (Firefox) according to OS.
  • wget
  • tar xfvz geckodriver-v0.24.0-linux64.tar.gz
  • mv geckodriver ~/bin/, where ~/bin is in PATH
  • git clone

Example Usage

Download 500 images of each class, check each image is a valid jpeg, save to directory dataset, create imagenet-type directory structure and create data = ImageDataBunch.from_folder(...)

from ai_utilities import *
path = Path.cwd()/'dataset'
pets = ['dog', 'cat', 'gold fish', 'tortise', 'snake' ]
for p in pets:
    image_download(p, 500, timeout=.1)
data = ImageDataBunch.from_folder(path,ds_tfms=get_transforms(), size=224, bs=64).normalize(imagenet_stats)


Downloads a specified number of images (typically limited to 1000) from a specified search engine. By default, images are saved to the directory dataset

image_download(searchtext:str, num_images:int, engine:str='google', gui:bool=False, timeout:float=0.3)
Select, search, download and save a specified number images using a choice of search engines, currently `google` or `bing`. (Downloaded images are checked to be valid `jpeg` files.)

positional arguments:
  searchtext            Search Image
  num_images            Number of Images
optional arguments:
  gui=False             Use Browser in the GUI
  engine='google'       Search engine {google|bing}
  timeout=0.3           Timeout for requests (May require optimization based upon connection)

From a directory containing sub-directories, each with a different class of images, make an imagenet-type directory structure.

It randomly copies files from labels_dir to sub-directories: train, valid, test. Creates an imagmenet-type directory usable by ImageDataBunch.from_folder(dir,...)

make_train_valid(labels_dir:Path, train:float=.8, valid:float=.2, test:float=0)
positional arguments:
  labels_dir     Contains at least two directories of labels, each containing files of that label
optional arguments:
  train=.8  files for training,   default=.8
  valid=.2  files for validation, default=.2
  test=  0  files for training,   default=.0

For example, given a directory:


Creates the following directory structure:


I am trying to use this on my google collab notebook. Few issues I faced and addressed are,

  • installing imagemagick. Otherwise python-magic keeps complaining
    !apt install imagemagick and libimagmagic-dev
    from ai_utilities import *

throws the following error. Appreciate your help
2020-04-20 23:28:09,833 - INFO - downloader - thread downloader-006 exit
2020-04-20 23:28:10,820 - INFO - icrawler.crawler - Crawling task done!

FileNotFoundError Traceback (most recent call last)
in ()
1 from ai_utilities import *
2 path=Path.cwd()/‘dataset’
----> 3 image_download(‘mango’,1)

3 frames
/usr/lib/python3.6/ in wrapped(pathobj, *args)
385 @functools.wraps(strfunc)
386 def wrapped(pathobj, *args):
–> 387 return strfunc(str(pathobj), *args)
388 return staticmethod(wrapped)

FileNotFoundError: [Errno 2] No such file or directory: ‘/content/dataset/mango’