Default metrics in multi-label classification

I would like to know, what is the default metrics for multi-label classification. It is obviously shown as ‘accuracy’ in binary classification, but shown as ‘lambda’ in multi-label.

Do you have any idea about how the sigmoid works for multi-label problem?
I got stuck in the satellite notebook, so is that true that we just have to replace the softmax with a sigmoid in the final layer? But how could a sigmoid output a N-d vector? Or we have to train N sigmoids, each for one of the N labels?
What I don’t understand is, how to calculate the loss of multi-label classification?
If you got any clue about this, please tell me some about the details, thanks!

Use the cross entropy loss function

Thanks! But what I don’t understand is, how could sigmoid output something like [0.8, 0.4, 0.6, 0.1, …]? I mean, sigmoid is just 1-d right?

Sigmoid and softmax are not the same thing
In case of a softmax, the probs will add up to 1 necessarily

Sigmoid just gives probability from 0 to 1 for each label.

The sigmoid function is used for the two-class logistic regression, whereas the softmax function is used for the multiclass logistic regression (a.k.a. MaxEnt, multinomial logistic regression, softmax Regression, Maximum Entropy Classifier).

Ans is from regression point of view but it generalizes

yes, I know what you mean, but what i don’t get is sigmoid just output a probability between [0, 1], but when it comes to multi-label problem, how could we just use 1 sigmoid cell to get something like [0.8, 0.4, 0.6, 0.1, …] (and the real label is [1, 0, 1, 0,…])

Or we have to use N sigmoid cells for each label (for each is binary-classification)

A sigmoid in the last layer will yield N “binary classifiers” if you train with hot-encoded N labels which produce probabilities as you mentioned in your example.


I have a query related to multi label classification. How do I make sure that during my training phase accuracy is calculated by considering only top-2 labels for a given datapoint.
Example: datapoint x got predictions [0.9, 0.8, 0.7, 0.2]. Now, if I consider only top-2 labels, they the labels corresponding to 0.9 and 0.8 should be used for accuracy calculation.
Is there any option that I can make the language_model use metric with accuracy(top-k) while training?