I’ve put up together a widget that can search and download images from Google Search and store them on disk, so that folks can play around with simple CNNs on their own dataset ideas!
- It’s built as a part of the existing fast.ai widget system. As easy as
- It works in both Notebook and Lab.
- It’s just 100 LOC.
- I’ve included some examples in the documentation in docs_src and updated the docs, including how to create a data bunch and a learner with the images downloaded that way.
- It allows folks to pick the resolution they want and how many images they want. It should work for 100+ images per label.
- It uses the google_images_download script and I’m not a huge fan of it. It’s under MIT license.
upd: missed the link: https://github.com/xnutsive/fastai/tree/image_downloader
I can also add some examples in more novice-folks facing docs, maybe in
examples folder? I think it makes sense to show this to more novice users, it’s obviously not a super serious research tool and more of a playground feature.
One thing that can be improved is to pluck the algorithm and pieces of code from google_images_download to fetch image urls, and then use parallel version of
basic_data to download them and hook all that to
fastprogress widget. Not sure if I have time to do that this or next week though.
Another think to work on would be to check all the downloaded images after they’ve been downloaded and auto-delete all the broken ones, otherwise users might have problems with DataLoaders, and with
num_workers > 0 they’re tricky to debug.
@lesscomfortable, what do you think? Let me know if you’d like to to tweak this a bit, or if we can merge this and tweak it on the go.