Doubts regarding multi-label classification

Yes, yes, all of it is of interest :slight_smile: and bounding boxes too, because, I think, ideally, it would be awesome to get as much info as possible from that image, how many dogs, what class each, etc.

1 Like

Awesome! Itā€™s an interesting idea certainly, Iā€™ll do it on the zoom chat here in say 10 minutes or so, Iā€™ll post about it on the study group thread too :slight_smile:

1 Like

@kodzaks link to it if you (or anyone reading this now) needs it:

Zoom

https://scikit-learn.org/stable/modules/multiclass.html

Video discussion: part 1 part 2

3 Likes

Thank you so much! It was super useful!

1 Like

My pleasure :slight_smile:

Hi @muellerzr. I am trying to run this notebook you shared and I get the same error I posted here

Any suggestions on how I can resolve this?

I tested just now. Notebook seems fine to me. See my reply here. Thanks.

Yijin

Thanks. I did manage to find a more convoluted work around. But I will take a look.

I guess the code is looking to see if the imdb_tok folder exists, and will create folder and the pickle file only if the folder does not exist.So have to remove the folder for it to create the pickle file.

Hey all, this thread is really useful, thanks to everyone. Iā€™ve taken Zachā€™s notebook and applied it to the bears problem we tackled in the early chapter of fastbook. My goal was to output nothing when a bear was not present (as in Zachā€™s example with the donkey).

Question 1
The video (zoom link one) suggests that the threshold value provided is used in training (at least I think it does) and referenced within learn.predict() but I canā€™t find any evidence of that in the code. I feel like I must be missing something. I appreciate its definitely used in accuracy_multi to output our accuracy metric during training, but I donā€™t see where else it is referenced.

Ignoring the accuracy metric, training using different thresholds doesnā€™t appear to make any noticeable difference.

For example, this is the learn.predict output on a city scape of New York:-

((#2) [ā€˜grizzlyā€™,ā€˜teddyā€™],
tensor([False, True, True]),
tensor([0.0607, 0.6589, 0.6534]))

for a learner with a super low threshold:-

learn = cnn_learner(dls, resnet34, pretrained=True, metrics=[partial(accuracy_multi, thresh=0.01)], loss_func=BCEWithLogitsLossFlat())

If learn.predict was referencing the threshold the output should be false for everything right?

Question 2
Would we expect a model trained on just a few classes to be more generous (ie. requiring a higher threshold) than a model trained on a large number of classes?

Iā€™ve found on our bear classifier, with just three classes, you get extremely high output (> 0.6!) for bears being present in images that are completely unrelated, such as city scapes. In contrast, Zachā€™s model, trained on 37 classes seems to be far more conservative with its outputs.

1 Like

Itā€™s in the loss function. Specifically, BCELossLogits decodes function :slight_smile:

Iā€™d say it depends on problem difficulty. Teddy vs other bears is a fairly straightforward problem, but pets isnā€™t per say. (But I am just speculating). As if we had the argmax as regular classification does, Iā€™d agree. Iā€™d look at the similar/commonly confused classes and see their probabilities to understand your true threshold/idea :slight_smile:

1 Like

Thanks so much for the reply Zach, I have a couple of follow on questions. apologies, just been really struggling with this concept and want to make sure I donā€™t go away with the incorrect understanding:-

1.
I see threshold in the loss function now, thank you. So that answers the question about its application in the training. However, in your notebook, you are passing thresh of 0.1 into the metrics, but not into our loss function. Is this inferred through to the loss function somehow?

learn = cnn_learner(dls, resnet34, pretrained=True, metrics=[partial(accuracy_multi, thresh=0.1)], loss_func=BCEWithLogitsLossFlat())

2.
Even if its included in the loss function, that still doesnā€™t answer the question of what threshold learn.predict uses right? I saw a post for fastai v1 (I wish I could find it again, but no joy) that appeared to suggest we need to use the actual output values from learn.predict to take predict the class ourselves, rather than relying on its actual class output, ie.

learn.predict(ā€˜donkey.jpgā€™)[2] > threshold

instead of just using the classes provided by

learn.predict(ā€˜donkey.jpgā€™)[0]

This was fixed in v2 in that decodes function. Specifically see here:

def predict(self, item, rm_type_tfms=None, with_input=False):
        dl = self.dls.test_dl([item], rm_type_tfms=rm_type_tfms, num_workers=0)
        inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True) # Here
        i = getattr(self.dls, 'n_inp', -1)
        inp = (inp,) if i==1 else tuplify(inp)
        dec = self.dls.decode_batch(inp + tuplify(dec_preds))[0]
        dec_inp,dec_targ = map(detuplify, [dec[:i],dec[i:]])
        res = dec_targ,dec_preds[0],preds[0]
        if with_input: res = (dec_inp,) + res
        return res

We can see that we run get_preds with_decoded. What this entails is we run the decodes of the loss function which then applies the loss functionā€™s threshold. So yes, we need to adjust BCELossLogitā€™s threshold as well. As they are two completely separate entities. You should pass the same threshold to BCE as well:

class BCEWithLogitsLossFlat(BaseLoss):
    "Same as `nn.CrossEntropyLoss`, but flattens input and target."
    def __init__(self, *args, axis=-1, floatify=True, thresh=0.5, **kwargs):
        super().__init__(nn.BCEWithLogitsLoss, *args, axis=axis, floatify=floatify, is_2d=False, **kwargs)
        self.thresh = thresh

    def decodes(self, x):    return x>self.thresh # See here, that `decodes` again
    def activation(self, x): return torch.sigmoid(x)

Does this somewhat clear things? :slight_smile:

2 Likes

Yes, that really helps - and my results now make sense. Thank you so much, thatā€™s been bugging me all weekend - really appreciate it.

learn = cnn_learner(dls, resnet34, pretrained=True, metrics=[partial(accuracy_multi, thresh=0.9)], loss_func=BCEWithLogitsLossFlat(thresh=0.9)) was all I needed :slight_smile:.

3 Likes

Speaking of the scikit-learn link you posted, is it possible to do the " Multioutput-multiclass classification (also known as multitask classification ) for NLP through FastAI?

This is described as a "classification task which labels each sample with a set of non-binary properties. Both the number of properties and the number of classes per property is greater than 2. "

Thanks!

If you can do it in PyTorch you can manage to do it in fastai with a little work, so yes.

I have yet to venture to Pytorch, Iā€™m not sure if it can be done in PyTorch. Nevertheless are you saying currently FastAI doesnā€™t have that feature built in, and to implement it would take work and knowledge of PyTorch?

Itā€™s not prebuilt, no. Thatā€™s an uncommon and specific idea, so odds are itā€™s not in there (in general). The capability to do so is if you understand how to get it going, but itā€™s not built in. This sounds like it would be an adjustment to your loss function, so Iā€™d start at looking at loss functions and model outputs for this.

1 Like

I have dataset similar to the planet dataset from v1ā€¦ i build a very nice model which works fineā€¦ now iā€™m trying to adapt it to the new fastaiā€¦ The new multi_class (lecture_6) is different from what iā€™m trying to do especially the labeling samples from a folder part, applying transforms.

Has anyone tried the planet dataset with the new fastai ?

iā€™m having issues with how to label my images

in V.1 i simply do the following

src = (fnames.from_folder(path) .split_by_rand_pct(0.1) .label_from_folder(label_delim=' '))

Thats impossible now ?
Any guide on how to do this with the new fastai ?