Tips for building large image datasets

It looks like Google has updated the dom. So, there is an issue with the repo.
Check this stackoverflow

Thanks a lot. googliser works great.

Thnks! so useful!
The only problem that it says all the images are not downloadable. :frowning:
Anyone faced that? what’s the solution?
tnx

1 Like

I’m facing the same issue. I looked for troubleshooting documentation in the original repo but there is nothing about it. Maybe it’s something related with the chromedriver installation, i’m not pretty sure of having done it right in my vm.

2 Likes

Hi Lindy, thanks for your sharing. For your information, I met one issue to use google-images-download, with out “Unfortunately all 100 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!..” I searched for the solution, which is most likely that google has changed something for scrawler.~

2 Likes

[FIX UPDATE] So, some of the issues below still stand, but the official guide DOES still work. If you are having issues, make sure “internet” is enabled in your Kaggle notebook (facepalm). The interface is a bit different from the other steps I found:

  1. Click the ‘Kaggle’ icon in top right (looks like >|)
  2. Click Preferences
  3. Toggle “Internet” on

@joedockrill has also provided a helpful tool below that he wrote and maintains.

----------------------original comment below---------------------------------

I am having the same issue: Unfortunately all 500 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!

I tried using this guide as well and had apparently the same problem. Every download attempt got an error like this one:

Error https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcQhPp0TLFJJJVgkpP-LThb46ySlqEL9kvTtHg&usqp=CAU HTTPSConnectionPool(host=‘encrypted-tbn0.gstatic.com’, port=443): Max retries exceeded with url: /images?q=tbn%3AANd9GcQhPp0TLFJJJVgkpP-LThb46ySlqEL9kvTtHg&usqp=CAU (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7f259ea9bc90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’))

For those of you still looking for solutions, these other resources appear to be useful. The Chrome extension is very user friendly and good for a first project.

How to scrape images

Google changed their page structure and everything written a while back to work on it is broken.

You can search around for something newer (or something which has been subsequently fixed), or you can use my scraper notebook and search on duckduckgo instead. It’s less painful.

1 Like

Currently, this handy Firefox extension works on Google and DuckDuckGo.


It also shows the number of image links you download and saves it in a CSV file.
3 Likes