Library for image downloads and other file-processing tasks

gsganden · October 25, 2020, 7:52pm

My team created a framework called Wildebeest that is useful for downloading and/or processing a bunch of files: https://github.com/ShopRunner/wildebeest. We have found it very useful for our computer vision work.

The library is designed for IO-bound workflows that involve reading files into memory, processing their contents, and writing out the results. It makes running those workflows faster and more reliable by parallelizing across files, handling errors, making it easy to skip files that have already been processed, and keeping organized records of what was done.

Wildebeest was developed for deep learning computer vision projects, so in addition to the general framework it also provides predefined components for image processing. However, it can be used for any project that involves processing data from many sources.

See https://medium.com/shoprunner/introducing-wildebeest-a-python-file-processing-framework-e38d652e3bd4 for a quick introduction and https://wildebeest-library.readthedocs.io/en/latest/quickstart.html for a more detailed guide.