What is the best way to model the ‘other’ class in a image classification problem?
In other words, for simplicity let’s say I built an image classifier (now with say 3 classes: dogs, cats, giraffes). I gather the training data required to build the 3-class classifier. When I deploy the model to production I get some dogs, cats, giraffes --> no problem, because our model can identify them easily. But I also get a host of images that are not cats, dogs, giraffes in them or say they have elephants, zebras or some random selfies in them. So how does one model the ‘other’ category?
Is it better to:
a) build a 3-class classifier and predict something as ‘other’ when the softmax probabilities are not peaky/skewed towards a particular class? and try to tune a threshold? or
b) somehow try to gather training data for ‘other’ class and try to now train a 4-class model (cats, dogs, giraffe, other/everything_else)? or
c) is there a better way to formulate the problem?
I think ultimately it depends on your classification goals.
Are you trying to choose an image that one, or more, together? Like an image may only have a lion, but then may also have an elephant too?
In that case, that’s what is called a “Multi Label” problem, where the output can have multiple positive values for each photo i.e. [1,0,0,1] for [giraffe, lion, zebra, elephant], for example.
However, if it’s a multi class problem, where the outputs are mutually exclusive and there CANNOT be more than one class at the same time, then yes you would use Softmax as your output.
For multi label multi class myself, I use sigmoid output and binary crossentropy as my loss. However the current cutting edge may be different now.
As opposed to categorical cross-entropy?
for multi-label i use multiple outputs with categorical cross-entropy for each - i prefer this to binary classification since i don’t need to normalize the input to be within 0,1 range
Interesting. I haven’t seen that before, but I primarily focus on NLP & NER. But it’s been a while since I last studied the most cutting edge stuff!
Best would be to throw a mix of other odds and ends into your original training data, and label them ‘other’. That way the model can learn to recognize stuff that doesn’t fit in other groups.
Thank you all for the feedback. I will try building out the training data for the ‘other’ class and evaluate the classifier.
@stevelizcnao: What I originally meant was that in my case the classes are mutually exclusive. They can either be a dog, cat or giraffe or other.
I haven’t taken the time to go through the fast.ai library in detail yet, but having a helper function to randomly pull images from imagenet (or it’s resized bcolz array) to tack-on an “other” class of images to the batch iterator feels appealing.
Please let me know if there’s any value in this…it might save people effort in collecting random images outside their actual dataset.
I have a similar problem, have a data set(train and test) with 3 parameters (image_band_1, image_band_2, and angle) and the target variable is binary (0 or 1).I Used the approach with two convolution networks one for band_1 and another for band_2 and merged it with the mean function after flattening and concatenated another input layer for the angle and finally to a dense layer of 10 than to the 1 with the sigmoid activation, Adam for the optimizer and binary cross entropy for the loss.It yields me 87% acc (yet I am not optimized with the parameters) which is quite good, But I like to know any other architecture which would fit better for this type of data.
I think there’s value, but it’s a bit too special-case to put it into fastai. Instead, perhaps share it as a gist, or a pip installable library of its own?