Default metrics in multi-label classification

Alex.Nikitin · March 5, 2018, 10:43am

Hi!
I would like to know, what is the default metrics for multi-label classification. It is obviously shown as ‘accuracy’ in binary classification, but shown as ‘lambda’ in multi-label.
Thanks!

ibunny · March 12, 2018, 11:00am

Hi!
Do you have any idea about how the sigmoid works for multi-label problem?
I got stuck in the satellite notebook, so is that true that we just have to replace the softmax with a sigmoid in the final layer? But how could a sigmoid output a N-d vector? Or we have to train N sigmoids, each for one of the N labels?
What I don’t understand is, how to calculate the loss of multi-label classification?
If you got any clue about this, please tell me some about the details, thanks!

ecdrid · March 12, 2018, 11:14am

Use the cross entropy loss function

ibunny · March 12, 2018, 11:56am

Thanks! But what I don’t understand is, how could sigmoid output something like [0.8, 0.4, 0.6, 0.1, …]? I mean, sigmoid is just 1-d right?

ecdrid · March 12, 2018, 12:07pm

Sigmoid and softmax are not the same thing
In case of a softmax, the probs will add up to 1 necessarily

Alex.Nikitin · March 12, 2018, 12:08pm

Sigmoid just gives probability from 0 to 1 for each label.

ecdrid · March 12, 2018, 12:09pm

The sigmoid function is used for the two-class logistic regression, whereas the softmax function is used for the multiclass logistic regression (a.k.a. MaxEnt, multinomial logistic regression, softmax Regression, Maximum Entropy Classifier).

Ans is from regression point of view but it generalizes

ibunny · March 12, 2018, 12:12pm

yes, I know what you mean, but what i don’t get is sigmoid just output a probability between [0, 1], but when it comes to multi-label problem, how could we just use 1 sigmoid cell to get something like [0.8, 0.4, 0.6, 0.1, …] (and the real label is [1, 0, 1, 0,…])

Or we have to use N sigmoid cells for each label (for each is binary-classification)

joschka · March 19, 2018, 1:33pm

A sigmoid in the last layer will yield N “binary classifiers” if you train with hot-encoded N labels which produce probabilities as you mentioned in your example.

ibunny · March 19, 2018, 2:07pm

thanks~

saikrishnab · July 18, 2019, 11:29am

I have a query related to multi label classification. How do I make sure that during my training phase accuracy is calculated by considering only top-2 labels for a given datapoint.
Example: datapoint x got predictions [0.9, 0.8, 0.7, 0.2]. Now, if I consider only top-2 labels, they the labels corresponding to 0.9 and 0.8 should be used for accuracy calculation.
Is there any option that I can make the language_model use metric with accuracy(top-k) while training?