Clarification about softmax


Hi, I have a problem understanding something about softmax.
Lets say I have 5 classes and an input that doesn’t fit any class. So let say that my model outputs a vector of activations with 4 negative activations and 1 very little positive activation.
Sofmax will give me a great confidence about that class (the positive activation).
Is that correct?


As the output vector of Softmax adds up to 1, the “very little positive” activation will give you a great confidence about that class. As Jeremy explains in his lessons, Softmax is excellent for multi-class classification.


Softmax is not meant to model such a scenario. If you have 5 classes, softmax will still try to assign the object to one of them.

If you have 5 classes and some images don’t belong to any of the classes, one way of modelling this would be via creating an artificial 6th class in the form of ‘no class’. You would need to modify your architecture and your target data then and have softmax output 6 values instead of 5 (images not belonging to any of the classes will be assigned to the 6th ‘no class’ class)

regardless what values enter the softmax layer, the output of it will sum to 1


Thank you radek, that could solve the problem I was thinking for, to mark in some way that sample as “irrilevant” instead assigning it to a class that it is not belong to just for the way softmax work.
Thank you very much

(marc) #5

In one course, Jeremy mentions an alternative to having an extra category for no_class. Instead of using a softmax, he recommends doing binary cross entropy on each class. You get no_class when the input doesn’t belong to any class.