Kind of a double post (see also Small tool to build image dataset: fastclass).
I wrote a small python package fastclass that tackles two problems I had when building a dataset:
-
easily download multiple image classes from the big search engines without using their (paid) APIs
- Quickly filter the results and mark images for deletion (or grades, more see below)
For my example I defined 25 search terms (guitars, it’s also in the GitHub repo under examples)…
The first script fcd pulls from Google, Bing or Baidu (or all 3) and resizes them, too (uses icrawler). Simply write a csv file where each row contains the search terms you want to push to the search engines
Then, the second script fcc launches a Tkinter GUI and you can quickly flick through the produced folders and mark any file for deletion or grade it for instance into various “grades” (grading is optional).
In my case I used 4 grades (and deleted a bunch):
Grade 1: good
Grade 2: only the body of a guitar (still super useful to distinguish between models)
Grade 3: headstock only (not used in first model)
Grade 4: really hard (back of guitar, not used in first model)
I ended up with roughly 9000 images for 11 classes. Quality check takes some time - but it’s worth it!
You simply push a number to mark for grade, d for delete and can always flick back and forth using arrow keys. Once you are done use x to terminate and write report file…
I wrote about it here:
Repo is here:
Notebook with classifier (97% on 11 gibson and fender models is here, I only used grade1+2 images for the classifier for the moment, will experiment with the others later):
Let me know with an issue of via these forums if you find any issues with it. Hopeit’s useful to you…