Generating image datasets quickly

(Theodoros Galanos) #1

Hello everyone!

I was going to start my second pass on the fastai course yesterday and wanted to follow through the courses with my own case studies this time. While that hasn’t been difficult for structured data (my work involves a lot of that) or language models (shockingly there’s a lot of text out there for all of us), it has been a bit trickier for image datasets.

Yesterday I found a really easy way of going about that. It was all possible with the google images downloader firefox addon: https://addons.mozilla.org/en-US/firefox/addon/google-images-downloader/?src=recommended

This amazing little addon allows you to download ALL images on a google image search with the press of a button, even providing a text file including all image links. I know there’s probably dozens of ways to scrape images off the net, but this one was the most satisfying I’ve found to date and wanted to share.

As an example, in a matter of 20 mins I had 12,500 images of different architectural practices (around 700 images of buildings from ~20 firms) for my ‘architect studio classification’ project.

Hope this is useful to some of you!

Kind regards,
Theodore.

16 Likes

Fast.ai v3 2019课程中文版笔记
Lesson 1 Discussion ✅
Lesson 1 official resources and updates ✅
Share your work here ✅
(Sudeep) #2

Thanks! I tried this extension and it seems to work, but it only downloaded around 80 images for the search term I had. Any suggestions on how to get it to grab more results for the same search term?

1 Like

(chandan) #3

this has been super helpfull, thanks a lot and kudos to dev of add on.:vulcan_salute:

0 Likes

(chandan) #4

just scroll to end of images and preview 1000-2000 images before hitting image button.it will download all of em.

0 Likes

#5

Add this chrome extension, it’s the easiest way I found up to now, as of Feb 2019

Hope this helps anybody who came here looking for such an approach.

0 Likes

#6

The firefox addon “Save Images” works wonders but only works on older versions of Firefox,like Waterfox.

0 Likes

(Shawn Goodin) #7

This is a great tool - I got way better pictures from here then I did on ImageNet of Monkey faces. I’m trying to write a model that can decipher between 3 different types of Monkeys

Couple of questions:

  1. How many images of each kind of monkey do you think are enough to train the model?
  2. Once I get the images do I need to crop them all to the same size, I could have swore I watch a video by Jeremy that said they needed to all be normalized to the same size 244 I think and same number of pixels. Is there a Fastai function that does this for you?
0 Likes

(Theodoros Galanos) #8

I think the typical number people quote for DL is 5,000 data points per class, but with transfer learning these days this might be much much lower.

Concerning resizing, I’m guessing you can do it while loading the batch using the transforms. It’s been a while since I did that can’t remember the exact line but there’s a resize image there.

0 Likes

(Jetze Baumfalk) #9

Thanks! Very useful for quickly putting together a dataset of badminton vs tennis matches :slight_smile:

1 Like