A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

barnacl · February 4, 2020, 7:40pm

DataBlock.from_coulmns is the update @foobar8675

muellerzr · February 4, 2020, 7:41pm

Yes. @barnacl is right. The newest version with that wasn’t out at the time. The library just updated to 0.0.8 today so I’ve gotta adjust a few things

barnacl · February 4, 2020, 7:41pm

@foobar8675

foobar8675 · February 4, 2020, 7:41pm

got it, thanks @barnacl, @muellerzr

muellerzr · February 4, 2020, 10:07pm

Thanks to @mgloria, the K-Fold notebook is now stratified

barnacl · February 5, 2020, 1:36am

Hi @muellerzr i’m at a total loss as to where to change the code(if needed at all).
From my understanding the output of the model (a tensor of length 37 in the Unknown Label example) should be passed to a sigmoid for each of the 37 classes. So we will get the output for each of the 37 classes between 0 and 1. We can use the thresh that is being passed in to check which of these 37 activations are higher than thresh and say those classes are present. We can compare this with the targets and calculate the loss and use this to update the weights.
if we run this in a loop with varying thresholds we should see which threshold gives us the best accuracy_multi.Here i tried with thresh = 0.01 and 0.99(for both the loss and accuracy).

Looks like with thresh =0.01 takes longer but gets to the same accuracy. (Q1:Not sure why though???)
Q2:If i’m not wrong the thresh we pass to accuracy_multi and BCEWithLogitLossFlat should be the same, is that correct?
Q3: def decodes(self, x): return x>self.thresh since this is decodes does it mean it is only used in the inference phase?

muellerzr · February 5, 2020, 4:05am

@barnacl just now can get to this So ideally yes you are right. My initial thought to where we’d adjust this is the call to MultiCategoryBlock as a parameter. If you follow the paper-trail you can see that eventually it assigns a loss function. Inside that loss function is where we’d want to assign this threshold (or leave blank if not defined).

Yes, they should be the same
No, that’s not necessarily what decodes means. Decodes is run once it’s done with an item. So here it’d check that on the tail end of us calling it into our model (encodes is done as the item is going through, decodes is going back out)

barnacl · February 5, 2020, 5:17am

@muellerzr “If you follow the paper-trail you can see that eventually it assigns a loss function.” could you please put a link to this?
you shared this previously - https://github.com/fastai/fastai2/blob/master/fastai2/data/transforms.py#L191
and this the loss definition - https://github.com/fastai/fastai2/blob/23482436498b3ed943ca4838e282924c10f58cb3/fastai2/layers.py#L329. i’m not able to find where the top-n classes are being selected based on threshold.
if i’m not wrong the threshold should be used after the sigmoid activation for selecting the classes.

muellerzr · February 5, 2020, 5:30am

@barnacl it operates as a sigmoid, inwhich if any values are > thresh, they are set as present. This goes back to nn.BCEWithLogitsLoss which is a pytorch function, and we see the sigmoid in the call to that function. It’s not a fastai function. To assign said threshold, ideally we would follow the following route:

fastai2.data.block.py:

def MultiCategoryBlock(encoded=False, vocab=None, add_na=False):
    "`TransformBlock` for multi-label categorical targets"
    tfm = EncodedMultiCategorize(vocab=vocab) if encoded else [MultiCategorize(vocab=vocab, add_na=add_na), OneHotEncode]
    return TransformBlock(type_tfms=tfm)

Which then we look at MultiCategorize and this takes us to fastai2.data.transforms

class MultiCategorize(Categorize):
    "Reversible transform of multi-category strings to `vocab` id"
    loss_func,order=BCEWithLogitsLossFlat(),1
    def __init__(self, vocab=None, add_na=False):
        self.add_na = add_na
        self.vocab = None if vocab is None else CategoryMap(vocab, add_na=add_na)

    def setups(self, dsets):
        if not dsets: return
        if self.vocab is None:
            vals = set()
            for b in dsets: vals = vals.union(set(b))
            self.vocab = CategoryMap(list(vals), add_na=self.add_na)

    def encodes(self, o): return TensorMultiCategory([self.vocab.o2i[o_] for o_ in o])
    def decodes(self, o): return MultiCategory      ([self.vocab    [o_] for o_ in o])

And NOW we see the loss function definition. From here, if we wanted to we would try to set a threshold of some degree to the BCEWithLogitsLossFlat call. We can do that into the call with a thresh as you saw before. Does this help some?

If we look then at BCEWithLogitsFlat inside of fastai2.layers, we get the following:

@delegates(keep=True)
class BCEWithLogitsLossFlat(BaseLoss):
    "Same as `nn.CrossEntropyLoss`, but flattens input and target."
    def __init__(self, *args, axis=-1, floatify=True, thresh=0.5, **kwargs):
        super().__init__(nn.BCEWithLogitsLoss, *args, axis=axis, floatify=floatify, is_2d=False, **kwargs)
        self.thresh = thresh

    def decodes(self, x):    return x>self.thresh
    def activation(self, x): return torch.sigmoid(x)

And here we see that by default our loss function uses a threshold of 0.5

barnacl · February 5, 2020, 5:46am

Thanks for sharing this, will look at this and get back.

Srinivas · February 5, 2020, 6:24am

What I find interesting is that one could conceivably provide a value for thresh to accuracy_multi which is DIFFERENT from the thresh value one uses in BCEWithLogitLossFlat. While both defaults are set to 0.5 there is nothing preventing diff values being provided to each. Is there any reason that any one could think of where this would be useful? I can see the need to have the thresh value being a parameter to both when you are just using multi_accuracy with some other loss_func say or using f1 as metric while using BCEWithLogitLossFlat but am wondering whether anyone is aware of use cases different than these two…

Srinivas · February 5, 2020, 6:26am

Also, I am aware of need to be able to provide different thresh value to different classes so as to maximize say the overall F1 score or accuracy per class. Has anyone figured out how to do that already?

barnacl · February 5, 2020, 6:32am

i agree with you, i think it would be safer to pass that as a single parameter. Not sure why they are separate. i still don’t understand full how it works currently.

muellerzr · February 5, 2020, 6:34am

This is why I was saying a PR may be nice. It’s seperate because we use it as a metric not as the loss function, and the above details where you would go to put that in

barnacl · February 5, 2020, 6:36am

i agree with you just not sure why it works correctly if that is the case.

muellerzr · February 5, 2020, 6:37am

It works correctly because we explicitly state it for our metric (and why they’re different)

Srinivas · February 5, 2020, 7:41am

Is there a way in fastai v2 to provide a different thresh value to different classes in the multi-class classification? rather than just a single value for all classes. While theoretically this could be useful and have seen this usage in some Kaggle kernels, I would also like to know whether anyone has actually used such an approach of different thresh for different classes in practice.

muellerzr · February 5, 2020, 7:42am

This would be done via some form of a custom loss function to some degree or metric. (And this would also be PyTorch code). You’d have to explicitly say what each thresh is and check that tensor but it could be done.

Probably similar to something like a weighted loss function

mgloria · February 5, 2020, 8:56am

I was re-visiting some of the notebooks, most precisely pets. I realized we create the dataloaders using 2 different methods

Method 1

Method 2

Note the change Datablock -> Datasets

I am however not able to map it in the slides (where is Datasets?):

Thanks a lot for sharing your wisdom

muellerzr · February 5, 2020, 4:59pm

Thanks a lot for that catch @mgloria! Datasets should be the lowest level (instead of PipeLine). I’ll try to put that edit in before class if I can.