Tips for building large image datasets

ueran · February 9, 2020, 8:04pm

Hi,
I’ve been trying to use the duck goose package, but am getting an error while running it -
"
Unfortunately all 100 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
"
I’m using Gradient/Paperspace.
I ran in the Jupyter terminal -
pip install duckgoos
pip install chromedriver

Any help would be appreciated!
Thanks

svenski · February 9, 2020, 9:02pm

Hi Eran,

That sounds like an message from chrome-driver. I don’t know why it occurred though.
Good luck!

AntonK · March 4, 2020, 12:03pm

Hey everyone. I just watched the 1st lesson, and I tried to make my own image classifier, using the code from lesson 1.
My idea was to classify images of people’s facial expressions (with focus on negative emotions), and for that I tried to scrape Google images using the uppermost method from this thread (google-image-download).

I’m using Colab and I get an error “Unfortunately all 5 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!”

Here’s my code:
https://colab.research.google.com/drive/1ZsWkt1s710JV46P0ao9d_cKW0TYaJSIM#scrollTo=AnARAlJWMYC0

Does anyone have an idea about how to solve the issue?
Tried googling this error and found this thread (https://github.com/hardikvasa/google-images-download/issues/280), where the last few messages are also from frustrated people like me.

gregpaton08 · March 10, 2020, 12:46am

The underlying script to pull images from Google is no longer working. There’s a bug report here:

Edit: Removing old link to a script that did not work for me. I ended up using this script which did work for me. If you have any issues with it just let me know.

TedGraham · March 24, 2020, 7:44pm

google_images_download is currently broken

I built a script to do batch download using a Bing Image API account: https://github.com/TedGraham/fastai-ted

It is similar to the PyImageSearch tutorial but my script:

allows you to specify multiple searches via a text-file
avoids overwriting existing files

It runs out of the box on Google Cloud instances, you might need to install python3 and PILLOW on some machines.

PolarisScouter · March 27, 2020, 8:41pm

Recent reviews for the add-ons are quite bad.

Following the official tutorial to scrape image using javascript trick here and it worked well.

loicloic · March 28, 2020, 5:29pm

This fork of google_images_download works, it has not been merged yet but you can use it in place of the pip install google-images-download version:

However:

I cannot download more than 100 images per search
I cannot use the -wr parameter for some reason it seems, which forces me to slightly change the keyword for searches which is not great to build a consistent image dataset. I chose to use different colors of a similar objects in order to build it anyway

harijos · April 2, 2020, 8:33pm

Thanks… this was really useful

am.sharan · April 3, 2020, 9:49am

can you share your notebook for reference

am.sharan · April 3, 2020, 9:51am

can you share your notebook so i can understand what worked

harijos · April 3, 2020, 12:56pm

I have shared the file here: https://github.com/debunker/HousePlantClassifier/blob/master/HPC.ipynb

am.sharan · April 3, 2020, 2:19pm

good thanks

kooc02 · April 5, 2020, 2:45pm

Thank you for posting this! I’m getting an syntax error message when trying to run the downloading step. Any advice?

Kirby · April 15, 2020, 12:35pm

This is a great way to host and serve data. It makes it very easy in the future to edit notebooks to reference separate groups of image data.

vict0ra · April 16, 2020, 12:40pm

The first method doesnt work for me. Finally it is finished that “Unfortunately all 50 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!”. Before it, you need to install Selenium and chromedrive (I had some errors between version to solve etc)

smajee · April 21, 2020, 1:59am

Lets start with the positive. The following worked for me.
googliser is shell script that worked for me (the only mechanism that I worked for me in colab).
Here are the steps (can be found in the git as well)

!apt install imagemagick
!bash <(wget -qO- git.io/get-googliser)
!googliser --phrase “apple” --title ‘Apples!’ --color ‘full’ --number 50 --upper-size 100000 -o ‘./data’ -G

Here what didn’t work for me:

google-mages-download. I should have searched the forum earlier.
ai_utilities it actually almost worked except the image_download() hits somekind of issue with the path.

s.j.hatfield · May 19, 2020, 9:01am

This flikr scraper has worked for me https://github.com/ultralytics/flickr_scraper. The instructions in the README are clear. Thanks to ultralytics for making it.

Now to train my croc or birkenstock classifier!

akgarg · May 28, 2020, 5:02am

How to access this data? Once downloaded in colab? A bit new here, started course 2 days back. Sorry for silly question

akgarg · May 28, 2020, 5:39am

Got It

bhavi-san · May 29, 2020, 5:37am

Im using the paperspace virtual machine. Will it work there?