I’ve written (or tweaked) several utility scripts that I have used for the first several lectures. Nothing too fancy. Perhaps others can post scripts that they have found useful.
ai_utilities
A set of scripts useful in deep learning and AI purposes, originally for use with fast.ai
lectures and libraries.
image_download.py
Download images (typically limited to 1000) from a specified serach engine, currently Google or Bing.
image_download.py is useful in several respects:
- Because is utilizes selenium, it is not limited by the search engine api and generally allows for more downloaded images.
- It can operate in
headless
mode, which means it can be used on a server without access to a gui browser. - The default browser is Firefox. The script can be modified to use other browsers such as Chrome.
usage: image_download.py [-h] [--gui] [--engine {google,bing}]
searchtext num_images
Select, search, download and save a specified number images using a choice of
search engines
positional arguments:
searchtext Search Image
num_images Number of Images
optional arguments:
-h, --help show this help message and exit
--gui, -g Use Browser in the GUI
--engine {google,bing}, -e {google,bing}
Search Engine, default=google
Example: image_download.py 'dog' 200 --engine 'bing' --gui
Notes:
- Requires
Python >= 3
- Install selenium:
conda install selenium
orpip install selenium
- Install other dependencies from conda
- Install an appropriate browser and browser driver (appropriate for your browser and operating system) in PATH.
- For example, if using Ubuntu and Firefox:
tar xfvz geckodriver-v0.19.1-linux64.tar.gz
-
mv geckodriver ~/bin/
, where~/bin
is a dir in PATH
make_train_valid.py
usage: make_train_valid.py [-h] [--train TRAIN] [--valid VALID] [--test TEST]
labels_dir
Make a train-valid directory and randomly copy files from labels_dir to sub-
directories
positional arguments:
labels_dir Contains at least two directories of labels, each containing
files of that label
optional arguments:
-h, --help show this help message and exit
--train TRAIN files for training, default=.8
--valid VALID files for validation, default=.2
--test TEST files for training, default=.0
Example: make_train_valid.py catsdogs --train .75 --valid .20 --test .05
filter_img.sh
Use file
to determine the type of picture then filter (keep) only pictures of a specified type.
Images are filtered in place, i.e., non-JPEG files are deleted. (This can be modified within the script.)
Usage: filter_img image_directory
Example:filter_image dogs/