Utility script for downloading images, sanity checking and organising into train/valid/test drive


(Sergiusz Bleja) #1

As part of working through the first lesson with my own examples, I have created a package called duckgoose to streamline the manual process. The package

  • downloads images from google images download for specific classes
  • sanity checks that images can be opened and have three channels
  • organises the images into separate folders (train/valid/test + classes) as expected by the fast.ai library

The package is installable via pip install duckgoose.
The code is available on github: https://github.com/svenski/duckgoose.

The name comes from the first use, trying to differentiate between ducks and geese.

I don’t think it really fits in the fastai package, but I’d be happy to merge it into there if people feel differently.
Cheers


(Sergiusz Bleja) #2

The duckgoose package has been updated with functionality to create a chart for a binary classifier showing heat maps for both classes indicating what part of the image is activating the class.

Here is an example output:

In addition, @AndrewK, improved the image fetching by removing duplicates by md5-hash from the images.


(Jana R) #3

Fantastic utility, thank you Sergiusz! - I installed and used this from Paperspace, to download Trees & Shrubs images for Lesson 1.