Utility script for downloading images, sanity checking and organising into train/valid/test drive

As part of working through the first lesson with my own examples, I have created a package called duckgoose to streamline the manual process. The package

  • downloads images from google images download for specific classes
  • sanity checks that images can be opened and have three channels
  • organises the images into separate folders (train/valid/test + classes) as expected by the fast.ai library

The package is installable via pip install duckgoose.
The code is available on github: https://github.com/svenski/duckgoose.

The name comes from the first use, trying to differentiate between ducks and geese.

I don’t think it really fits in the fastai package, but I’d be happy to merge it into there if people feel differently.
Cheers

11 Likes

The duckgoose package has been updated with functionality to create a chart for a binary classifier showing heat maps for both classes indicating what part of the image is activating the class.

Here is an example output:

In addition, @AndrewK, improved the image fetching by removing duplicates by md5-hash from the images.

1 Like

Fantastic utility, thank you Sergiusz! - I installed and used this from Paperspace, to download Trees & Shrubs images for Lesson 1.

2 Likes