So I decided to create a computer vision app inspired by both the transfer-learning approach taught in the first lecture and HBO’s Silicon Valley show.
The Idea is to clone the Seefood App (classify between a hotdog or not a hotdog) and augment it with the ability to recognize other types of food ie: taco and lobster
I have two plans to proceed with the model. And I am not sure what is the better approach.
Approach 1:
-
The model is trained on N classes of food (to be recognized) plus an unknown category with random image object not belonging to any of N classes.
-
Last softmax layer should output:
-
[ P(hotdog),P(taco),P(lobster), … a bunch of food …, P(unknown)]
-
The model returns the category with the highest probability
Approach 2:
-
The model is trained on N classes of food to be classified only.
-
The last softmax layer outputs:
-
[ P(hotdog),P(taco),P(lobster), … a bunch of food …]
-
We set a threshold probability.
-
If none of the class achieves a probability above the threshold, then the model returns the unknown label.
-
If one or more class achieves a probability above the threshold, then we return the class with the highest probability.
Questions:
- What is the better approach?
- If the first approach is better, How do I pick the ideal data distribution for the unknown category’s training data? (Do I just grab non-food picture from random category and dump those in the unknown training data ?)
- If the second approach is better, How should I decide what is the proper threshold probability?
All inputs are appreciated. Thanks !