Multi label classification - how does it work?

So, how does Training and later Prediction work for multi label classification?
Does it try to predict several categories for an input image?
Does not soft max still happens on the last stage of the NN trying to make sum of all output to be equal to 1? If so how can we still to select more than 1 output category?
Or does library somehow transparently for us changes NN architecture and replaces last layer to Sigmoids instead of Softmax?

1 Like

Jeremy covers multi label classification in lesson 3.


If I remember correctly, the last layer will be adjusted accordingly.

I am asking this question exactly because I am going now through the lesson 3 (planet) and wondering how does it work. Externally it just looks transparent calling to the same learner class.
My assumption is that this Learner class according to the type of data it gets (single or multiple labels) creates slightly different NN architecture.
But I did not see Jeremy talking about that.

Hello, did you figure out the problem? I would also know how fastai is changing from softmax to sigmoid…

hey, @msrdinesh and @vladgets.
I don’t think the last layer is adjusted at all. We use the same classifier only the metric changes. For example, when we do handwritten digit classifier, we get a list of 10 probabilities for integers 0 - 9. When we take the maximum out of these 10 probabilities, that’s our prediction and that is a multiclass (not multilabel) classification problem.

In the same problem, instead of taking the maximum, if we take all the classes with a probability above a certain threshold (0.2 for example) then we get a multilabel problem. So only our metric changes.

If our digit is 4, and the probability of 4 is 0.7 (max), then our model correctly classifies it. But in case of multilabel, you might get the output as 4 and 9, since 9 might have a probability of 0.2.

I hope this clears your doubt

1 Like

Thankyou @dipam7 I understood that. But, I have seen in blogs and other resources that taking sigmoid as the last activation make more sense than softmax in multi-label classification. Because of a sample being predicted as one class doesn’t affect the chance it being the other class when we take sigmoid as activation. I am thinking why fastai guys have 'not implemented the same. Please correct me if I were wrong.

If you take a look at the file in the fastai library, you’ll see that when you create your data, it either creates a CategoryList (mono-label classification) or a MultiCategoryList(multi-label classification) depending on your case. It then affects the adequate loss function (Categorical Cross Entropy or Binary Cross Entropy in our case)

Then, in the file, you can see that each loss function is linked to a particular final activation function (CE with softmax and BCE with sigmoid), that will be appended at the end of your model when you do the predictions.

So in short, fastai is able to figure out if you want to do a mono-label or multi-label classification and adapt your model automatically.


Thanks man! That was very helpful…