Tips for building large image datasets

[FIX UPDATE] So, some of the issues below still stand, but the official guide DOES still work. If you are having issues, make sure “internet” is enabled in your Kaggle notebook (facepalm). The interface is a bit different from the other steps I found:

  1. Click the ‘Kaggle’ icon in top right (looks like >|)
  2. Click Preferences
  3. Toggle “Internet” on

@joedockrill has also provided a helpful tool below that he wrote and maintains.

----------------------original comment below---------------------------------

I am having the same issue: Unfortunately all 500 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!

I tried using this guide as well and had apparently the same problem. Every download attempt got an error like this one:

Error https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcQhPp0TLFJJJVgkpP-LThb46ySlqEL9kvTtHg&usqp=CAU HTTPSConnectionPool(host=‘encrypted-tbn0.gstatic.com’, port=443): Max retries exceeded with url: /images?q=tbn%3AANd9GcQhPp0TLFJJJVgkpP-LThb46ySlqEL9kvTtHg&usqp=CAU (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7f259ea9bc90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’))

For those of you still looking for solutions, these other resources appear to be useful. The Chrome extension is very user friendly and good for a first project.

How to scrape images