I wanted to explore building a custom classifier using lesson 1 as the template. Cricket and baseball are two sports with similar ‘features’ so I tried to distinguish between them. I was able to get reasonable accuracy (90%) with 15 training images. I followed these steps:
- Google Image search for ‘cricket’ and ‘baseball’
- Download 20 images of cricket/baseball each.
- Use 15 of these for training and 5 for test (valid dir) and run the lesson1 python notebook.
After playing around with the model I saw that a learning rate of 0.1 gave good results. I got an accuracy of 90%, classifying 9/10 images correctly. Since the number of training images is tiny, I did not really use the lr_find() tool to find the optimal learning rate.
Also, I noticed that unfreezing/data augmentation did not have a much better effect on performance. My guess is the number of images was too small for the initial layers to generalize from and it would be better off to take advantage of the pre-trained weights from the resnet model as much as possible.
The one image incorrectly classified is shown below. It was possible to get 100% accuracy once in a blue moon, by running the model several times and varying hyperparameters. I’m not sure if 100% accuracy means much though since the number of test images is so low. Going ahead I’d like to try this with a larger number of valid images to see how the model generalizes.
The full results for one run are in cricbase_15train.ipynb
Suggestions/corrections/other ideas are welcome. Some future directions I can think of are
Automate downloading of images, so that one can build better and more custom classifiers using the lesson1 template. PS: figure out how to deal with image copyright issues?
Identify techniques/practices that work better for small datasets. Apply them to domains where collecting data is hard/expensive (bio-medical images, industrial settings)
Try out some fun examples: Ship vs Submarine? Is the human wearing glasses or not?