Classification with length-varying labels

phinate · March 30, 2020, 11:02am

Hi all!

A fun application I thought of after watching lesson 2 was to try and classify different Pokemon images by their type. A problem with this is that Pokemon have both a primary and secondary type, e.g.

type(Pikachu) = ['electric']

type(Charizard) = ['fire','flying']

While this is just one example, I think the idea of classification with non-constant length labels is an interesting idea – does anyone know how this is approached in practice?

My ideas were to introduce a ‘None’ type such that type(Pikachu) = ['electric','None'], but that could lead to ['None', ...] predictions, which wouldn’t make sense. Also, one could perhaps have ['fire'] and ['fireFlying'] as unique types, but I feel like that’s bypassing the problem (e.g. this doesn’t extend well to many types in terms of combinatorics of different labels)

oneironaut · March 30, 2020, 1:02pm

What you’re trying to do here is Multilabel classification. If the 2020 course is somewhat similar to the previous courses there will be a dedicated lesson for this topic. Probably the next one or the one after that.

So stay tuned! By the way, that’s an awesome idea you have there

gokkulnath · March 30, 2020, 3:13pm

Hey !

Thats an interesting application of Multi Label Classification. But I think you are a bit confused, For eg. when predicting on pikachu image, you would be returning a list of float numbers. Since it is a multi label problem, these float values will lie between 0 and 1 (i.e equivalent to probablity) .

Since we have to predictions on all labels (I mean float values), it should not be a problem to return arbitrary number of classes. you might have to write a function to threshold and return only classes which are above threshold instead of all the classes(default way). Have a look into Multi label example here: Link

phinate · March 30, 2020, 4:59pm

Thanks for the link.

I think there’s maybe a subtle difference here, but please correct me if I’m wrong!

I’m well aware that the returned logits would correspond to different classes. But, if a Pokemon had type a,b, I don’t think I’d be content with something like a logit of 0.6 for a, and 0.4 for b, then assigning both classes as the type. What I really want is the probability of type[0]= a and type[1] = b to be high, and not take the two highest values above a threshold.

I think the main difference for this problem is that I’m interested in estimating the joint probability distribution of the primary type and the secondary type, or equivalently the conditional distribution of the second type given the first type.

gokkulnath · March 30, 2020, 5:21pm

if i understand your explanation, what you describe is returning top-k predictions usually done in Muti Class Classification problem, but this is a multi label problem, the last layer has the activation sigmoid instead of softmax. In this case the sum of the last layer output need not sum to 1. They are individual class probablities

Which means that the when your model predicts type-b to be 0.4, it is confident only 40%. If you have a well trained and optimized model, it should predict high probabilities for correct labels(i.e both a and b will be high). Also, I recommended thresholding only to supress predictions which have very low confidence(say less than 0.2 / 20%) as model could still predict image is of some other type c.
Guess you can have a look here for more clarity : Link
P.S : Video from previous version of Fast.ai. Library has API Changes but concepts remain the same.