So I decided to create a computer vision app inspired by both the transfer-learning approach taught in the first lecture and HBO’s Silicon Valley show.
The Idea is to clone the Seefood App (classify between a hotdog or not a hotdog) and augment it with the ability to recognize other types of food ie: taco and lobster
I have two plans to proceed with the model. And I am not sure what is the better approach.
The model is trained on N classes of food (to be recognized) plus an unknown category with random image object not belonging to any of N classes.
Last softmax layer should output:
[ P(hotdog),P(taco),P(lobster), … a bunch of food …, P(unknown)]
The model returns the category with the highest probability
The model is trained on N classes of food to be classified only.
The last softmax layer outputs:
[ P(hotdog),P(taco),P(lobster), … a bunch of food …]
We set a threshold probability.
If none of the class achieves a probability above the threshold, then the model returns the unknown label.
If one or more class achieves a probability above the threshold, then we return the class with the highest probability.
- What is the better approach?
- If the first approach is better, How do I pick the ideal data distribution for the unknown category’s training data? (Do I just grab non-food picture from random category and dump those in the unknown training data ?)
- If the second approach is better, How should I decide what is the proper threshold probability?
All inputs are appreciated. Thanks !