Butterfly classification
Butterflies are very beautiful. I’ve often wanted to know the name of butterflies that I observe in the wild. In my home country of Malaysia there are over 1000 identified species of butterfly, each with its own distinct features and coloration. Making a classifier to help with conservation and casual identification seems like worthy project to set myself to.
To start with, I made a classifier that identifies a butterfly by its family-rank. This also seems like a good case to benchmark the fine-grain classification capabilities of the resnet34 model, as the visual difference in features between butterfly families are extremely small.
These are the notable visual features that distinguish each of six families:
-
Swallowtails (Family Papilionidae): Notable for having tail-like appendages at the end of the wings
-
Brush-footed Butterflies (Family Nymphalidae) : The largest family of butterflies, called brush-footed for having tiny forelegs that are used as tasting appendages.
-
Whites and Sulphurs (Family Pieridae) : Most Pieridae have white or yellow wings with markings in black or orange.
-
Gossamer-winged Butterflies (Family Lycaenidae): Tiny butterflies that have wings that are often streaked with bright colours.
-
Metalmarks (Family Riodinidae): Wings of butterflies of this family are notable for the metallic-looking spots on the wings.
-
Skippers (Family Hesperiidae): Should be the easiest to differentiate, skippers have a robust thorax similar to a moth, and antennae that end with a hook.
I acquired ~200 images of examples of each family from Google Images using the wonderful little javascript tool written by @melonkernal to exclude irrelevant images and collect the image urls for my dataset.
After initial training this is the plot for the learning rate:

Here is training after unfreezing with the confusion matrix:
The accuracy I got for the final model was
83% which seems pretty good for a fine-grained problem, but I think I overdid it on the number of epochs. The error rate is going up and down, which I think is a sign that it’s overfitting.

And here are the top losses:
The gist for my notebook can be found here
I deployed the app to heroku by following the example of: @simonw @nikhil_no_1
https://butterfly-family-classifier.herokuapp.com/
What I would do differently:
- Improve the data quality. Many of the most confused images were dirty, I should look into more ways of cleaning the data to improve accuracy.
- Automate the building of my dataset. I’d like to be able to scale up to 1000 classes if I take it further and attempt species-rank classification.
- Fewer epochs. With transfer learning, we already have pre-built weights that work pretty well. I didn’t really improve the accuracy very much after running too many epochs.
I’d love to hear feedback!
It’s my first time getting into fast.ai but I feel like I’ve learnt a huge amount from just working on a single problem. Thanks @jeremy @rachel for making this course available for us!