Thought about this for a bit more.
Seems like adding a separate input for max size will confuse novice users: they have search size (i.e. 400x300, 800x600, etc), and there’ll be one more size input — might be too much.
And if I just add an option to pass
max_size to the
download_google_images() then it’s so small and simple it’s not even worth adding into the download function — you can just do this:
path = Path("data")
download_google_images(path, "cats", n_images=100)
Alternatively, if you’re using the widget and not the
download_google_images() function directly, you can just put
verify_images(max_size=224) in the next cell of the notebook and get the same result.
However, I’m not 100% sure. If there’s a good training performance boost to having everything resized prior to training, I’ll add something in there so that novice users get better results out of the box and get more encouraged to pursue their projects.
One way I think the widget can be improved is if I add some docs and examples on how to clean up the new dataset by showing images that are not like other downloaded images. I.e. if I want to download 100 polar bears and get 95 bears and 5 white bear toys, I want a snippet of code to invoke
ImageCleaner with one line of code that’ll show me the outliers and ask if I want to delete them or keep them.
@sgugger what do you think?