After lesson 3, I created a multi-label image classifier that detects McDonald’s food items. It’s available at https://mcdonalds-item-detector.onrender.com/. You input a picture of a McDonald’s meal, and it tells you whether any ‘burger’, ‘fries’, ‘drink’, or ‘nuggets’ items are present.
I think food classification is a really interesting area. Some potential applications could include improving food safety for people with allergies, improving the eating experience for people who are blind, and potentially enabling reverse-engineering of recipes.
- I used a resnet34 model, and did some progressive resizing. I started out training with 128 size images, then did some fine-tuning, then trained with 256 size images, then did some more fine-tuning
- I was able to get to ~94% accuracy (with a threshold of 0.2)
- I followed Jeremy’s suggestion in the lecture, and did a lot of playing around with learn.recorder.plot_losses() to see the effects of different learning rates
- Getting the data for this was a pain. I was able to automate the image downloading, but then I created a csv file and had to manually input tags for all of my images. I ended up using only ~150 images because it was so tedious. I definitely have a much greater appreciation for people who put together huge datasets!
- Code for my model is available at GitHub