Hi everybody!
While watching lesson 1 of the new course I was wondering where to get a big forest/nature related dataset to build an image classifier on, as this is the domain I am coming from.
I finally found the ImageCLEF Plant Identification Challenge 2013 which provides a already labeld training dataset containing images of 250 plant species on 10485 images (25GB). Most of the images are showing leafs but there also images of flowers, fruit, stem & the entire plant.
For the classifier I used the images with a uniform background (category=SheetAsBAckground) which only contain leaves: 4921 samples and 124 classes.
I started with training a pretrained resnet34 and already got the error rate down to ~3% after 17 epochs. Interestingly fine tuning didn´t help to improve accuracy/loss drastically.
What to do next:
- Train network for category=NaturalBackground
- Maybe exclude classes with samples < 10?
Below you can find a GIST of the notebook I used. I am looking forward to your feedback on what I can improve or what else could be done with this dataset
Cheers,
Harald