My homeland of Trinidad and Tobago is known as “The Land of the Hummingbird”. So I decided awhile ago to try to build an image classifier for the 17 recorded species of hummingbirds found there. The problem was data, thanks to lesson 2 downloads notebook, this was solved. I adapted the notebook initially for just 3 species and was getting about 25% error rate, which was about the same when compared to my results in a notebook using the Birds Species dataset so things were looking promising.
However this got worse to about 37% error rate when I was up to 9 species. This notebook shows this run with unpruned data from Google for hummingbird species.
The errors in the images Google was retrieving, was not in non-images of birds, but lay in the wrong species type being retrieved from Google Images, so the FileDelete tool wasn’t quite apt for pruning in this case as it only showed the image, not how it was classified.
However, once the data was pruned, I also sub-divided species into male and females for some species where there were clear distinguishing characteristics between the two genders. So I ended up with 14 categories with pruned data and got back to about 25% error rate. This notebook shows pruned data with 14 categories of hummingbird by species and in some cases gender.
After seeing @simonw post and exploring his source code I pushed my own model into a similar docker image and deployed it to an azure website at https://hummingbirds.azurewebsites.net/ if anyone wants to try it out for themselves.
Hopefully as I explore more I can pretty up the UI for results a bit, and add the remaining species as I build the pruned training dataset over time.
Thanks to @sparalic for inspiring me to have enough confidence to share my work too as she shared hers. A re-edit of this post is also published as my blog post for this week’s learning.
6 Likes