Cats&Dogs 96% accuracy is impressive but how to reach near 100%?
The fastai v1 library is amazing and with a few lines of codes you get excellent accuracy.
Classification works great for most cases but what if you cannot afford to misclassify an item? (there could be economic or legal consequences).
What could you do boost your model?
I thought that this could help but I would like your opinion on it.
- Find more examples of your most confused categories (will become overrepresented)
- Train another network to identify only one category (so we would have a single model to identify american_pit_bull_terrier, another for staffordshire_bull_terrier, …)
- May be train for waaay longer?
Would you have other ideas?
You’re always going to have to accept some level of risk (even with well-trained human classifiers). It’s worth knowing what risk you can accept.
Start with the fewest categories you can; when the prediction has probability below a certain threshold get it independently checked by a human. I’m not sure whether having one network per category would help, but I’d try to stick to a single network (easier to maintain, and often works well in practice).
You’ll want an exceptionally well labelled dataset; it’s worth getting each item labelled by a few people to check for consistency. You may want to train another network to identify ambiguous cases, that get validated by a human.
Make sure your validation (and ideally training) data comes from the same place your test data will. Otherwise you can get 100% in validation, but poor performance in practice.
Have a sufficiently large validation set to get a good read of the accuracy (e.g if you’ve only got 20 cases of a class in validation, a lucky guess will make a difference of 5%).
Adding more training data where the classifier performs poorly is a good idea to help differentiate them. This is going to generally be much more beneficial than training for longer, which will just fit to whatever information is in your training data.
I"m finding this to be the hardest part of deep learning as well.
I was training a classifer on fish or not fish as a test model with lots of data, and even if I get 98% plus on both validation, and training, if I randomly scrape the web for images Ive noticed that it continues to make mistakes on random things, despite adding more and more data. Half the time I am like why is it making this mistake.
Even for simple classifcations it seems you really need a lot of data? I dont know what the bound is though.
I"m trying to search for unique image transformations to artifically increase data, and I think I will need to try to learn to build new models? However I have a lot to learn and still playing around with transfer learning.
Training another model for ambigious cases sounds interesting. I will give that a try.
You can and should tailor loss functions (or action on interpretation of results) to the real life impact of false positives and false negatives. For instance, you are probably willing to accept a lot more false positives in order to reduce the level of false negatives in a medical condition classification so that a specialist can take a closer look.
Machine learning models are better thought of as ‘augmenting’ intelligent decisions rather than as a whole ‘artificial’ replacement. Classify and automate the (e.g. 90% of) ‘sure’ determinations, and funnel the remainder (e.g. 10%) through a more reliable process with constrained resource, such as human expert attention. In real life there often isn’t a short term goal of reaching 100% accurate classification.
If you want to improve yours cats/dogs example, the info is right there in the fastai lectures. Use augmentation, progressive training, intelligent consideration of hyperparameters, more training time, and possibly above all enough raw data. With all this, you should be able to reach human levels of classification.
For reference, I was able to get the cats and dogs to .008% error by using progressive resizing: Progessive resizing = got the pets data in lesson 1 to .008 error(!)
The only one it couldn’t get correct was the two bull terriers…which would likely require more data for those two, or maybe there simply isn’t enough distinguishing visual factors for them (recall Jeremy saying it seemed the only thing was one’s nose was more red or something).