Tips for building large image datasets

am.sharan .ipynb using colab:

Continuing the discussion from Tips for building large image datasets:

https://colab.research.google.com/drive/14Zwx9Uh9p8lf-H18sLoaI3s8ByT5jGk6#scrollTo=gaRW9mnxSiUc

worked.

2 Likes

Hi,

I decided to classify mushrooms and found this website, where observers can upload photos of mushrooms along with the species they believe the mushrooms belong to (and maybe more/less specific denominations, e.g. infraspecific name/stirp). The community also contributes by stating their opinions based on the photos.

Well, anyways, the maintainers explicitly ask not to scrape the website but rather drop them an email. I did so and got a reply within 5 minutes or so. 30 min later I got access to 10⁶ mushroom figs. Great people.

2 Likes

Hi,
I’ve been trying to use the duck goose package, but am getting an error while running it -
"
Unfortunately all 100 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
"
I’m using Gradient/Paperspace.
I ran in the Jupyter terminal -
pip install duckgoos
pip install chromedriver

Any help would be appreciated!
Thanks

Hi Eran,

That sounds like an message from chrome-driver. I don’t know why it occurred though.
Good luck!

Hey everyone. I just watched the 1st lesson, and I tried to make my own image classifier, using the code from lesson 1.
My idea was to classify images of people’s facial expressions (with focus on negative emotions), and for that I tried to scrape Google images using the uppermost method from this thread (google-image-download).

I’m using Colab and I get an error “Unfortunately all 5 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!”

Here’s my code:
https://colab.research.google.com/drive/1ZsWkt1s710JV46P0ao9d_cKW0TYaJSIM#scrollTo=AnARAlJWMYC0

Does anyone have an idea about how to solve the issue?
Tried googling this error and found this thread (https://github.com/hardikvasa/google-images-download/issues/280), where the last few messages are also from frustrated people like me.

The underlying script to pull images from Google is no longer working. There’s a bug report here:

Edit: Removing old link to a script that did not work for me. I ended up using this script which did work for me. If you have any issues with it just let me know.

2 Likes

google_images_download is currently broken

I built a script to do batch download using a Bing Image API account: https://github.com/TedGraham/fastai-ted

It is similar to the PyImageSearch tutorial but my script:

  • allows you to specify multiple searches via a text-file
  • avoids overwriting existing files

It runs out of the box on Google Cloud instances, you might need to install python3 and PILLOW on some machines.

1 Like

Recent reviews for the add-ons are quite bad.

Following the official tutorial to scrape image using javascript trick here and it worked well.

This fork of google_images_download works, it has not been merged yet but you can use it in place of the pip install google-images-download version:

However:

  • I cannot download more than 100 images per search
  • I cannot use the -wr parameter for some reason it seems, which forces me to slightly change the keyword for searches which is not great to build a consistent image dataset. I chose to use different colors of a similar objects in order to build it anyway
1 Like

Thanks… this was really useful

can you share your notebook for reference

can you share your notebook so i can understand what worked

I have shared the file here: https://github.com/debunker/HousePlantClassifier/blob/master/HPC.ipynb

good thanks

Thank you for posting this! I’m getting an syntax error message when trying to run the downloading step. Any advice?

This is a great way to host and serve data. It makes it very easy in the future to edit notebooks to reference separate groups of image data.

The first method doesnt work for me. Finally it is finished that “Unfortunately all 50 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!”. Before it, you need to install Selenium and chromedrive (I had some errors between version to solve etc)

Lets start with the positive. The following worked for me.
googliser is shell script that worked for me (the only mechanism that I worked for me in colab).
Here are the steps (can be found in the git as well)

  1. !apt install imagemagick
  2. !bash <(wget -qO- git.io/get-googliser)
  3. !googliser --phrase “apple” --title ‘Apples!’ --color ‘full’ --number 50 --upper-size 100000 -o ‘./data’ -G

Here what didn’t work for me:

  1. google-mages-download. I should have searched the forum earlier.
  2. ai_utilities it actually almost worked except the image_download() hits somekind of issue with the path.
4 Likes

This flikr scraper has worked for me https://github.com/ultralytics/flickr_scraper. The instructions in the README are clear. Thanks to ultralytics for making it.

Now to train my croc or birkenstock classifier!

2 Likes

How to access this data? Once downloaded in colab? A bit new here, started course 2 days back. Sorry for silly question