Multi-label classification for images and more?

Hi Everyone!

Just finished lesson 7 of Jeremy’s course! It’s great, and thank you for all the effort Jeremy!

I’m wondering if anyone has any experience, or can point me to the right direction, about how one would structure a NN to have multi label classification. I know that it requires (I think) multiple output neurons with (I believe) binary classification for each possible output, but I am wondering if this is true?

I’m trying to build a multi label image classifier to help my friend in Ethiopia with some NGO farming projects, but I wonder if I am getting in over my head!

If you have any experience with it or can point me in the right direction, that would be very helpful.



Without googling first (uh oh), my gut reaction would be to make a model per class. In particular, I’d take VGG, reset the dense layers, and train with binary labels. All those forward passes might be too expensive in production, though. With some more thought or some googling I think we could figure out a single-model solution.


More thoughts:

Hmm. The targets could be…sparse like a one-hot encoding but while allowing more than one class turned on in the vector. I wonder how this would interact with the softmax layer. Maybe we wouldn’t want the outputs to sum to 1 anymore, or maybe that wouldn’t matter. Would we pick a threshold? How would we find it? Hmm.


One idea would be to start by training a separate model for each class as a benchmark. Then, you could combine those models into a larger model (say, by using Keras’ merge function), putting some dense layers on it (or maybe even more convolutional layers) and seeing how it does, with respect to the benchmark.

Maybe the larger would perform better because it has more information and capacity for learning, but maybe it would perform worse in a red herring way. And this might depend on the class. Maybe combining two specialist networks A and B would perform better than A on A’s class, but perform worse than B on B’s class. In this case you could use the larger network for A’s class and use B for B’s class.


Here’s a paper (2014): CNN: Single-label to Multi-label

I couldn’t figure out what they’re doing from the abstract, but mentioning semantic segmentation gave me an expensive idea for multi-label classification. You could segment the images into regions and label those regions (e.g. people are here, crop X is there, crop Z is there) for your data. Then, you could train a semantic segmentation network on this data. Then, after doing a forward pass, you could consider an entity to be present if the number (or percentage) of pixels corresponding to it is above a threshold. I think the threshold would have to be different for each class. And spatial resolutions would affect it and so would how much the object fills the image…this idea is dying fast.

Okay. I hope these thoughts helped. To go further it would help to know more about your problem; for example, knowing the kinds of things you want to detect and the nature of the images (e.g. will they be taken from a satellite, a drone, or a person?).

I’m not sure what you mean by multi-lable, but the VGG16 imagenet classifier is multi-label. It identifies and classifies 1,000 different labels. In cats and dogs we changed the final dense layer to just output 2 and in statefarm 10.

Here’s a keras implementation of something that can label multiple items per image: . It also localizes them. Is this what you’re looking for?

Congrats on completing lesson 7!