A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

muellerzr · February 14, 2020, 7:01am

Most any layer that looks different or odd is in the layers.py file. If you look you can see PixelShuffle is just nn.PixelShuffle (plus a few bits)

kshitijpatil09 · February 14, 2020, 7:49am

True. But I want to know theory behind it and why it’s used extensively by fastai. I believe Jeremy once said he’ll explain the same in the second part of course but I didn’t find any reference to that. Looking for article/paper expounding on this topic.

muellerzr · February 14, 2020, 2:53pm

@kshitijpatil09 Fastai Unet

Srinivas · February 14, 2020, 4:37pm

For pixel shuffle specifically the Torch code itself references this paper:

foobar8675 · February 14, 2020, 5:31pm

for pytorch tutorials, i personally liked this one. it is more an intro https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

barnacl · February 14, 2020, 5:32pm

Found a few answers after exploring the code as @muellerzr suggested.
If you pass just slice(end) then the last group's learning rate is end, and all the other groups are end/10. here (look under lr_range). What role slice plays can be seen here.
So now we get three lrs ie [lr/10, lr/10, lr]. how is the network split so we can apply these different lrs?
This happens in the Learner it takes a parameter called splitter. splitter are a bunch of functions that are defined based on the architecture family.
As we are dealing with a resnet we have - def _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)
So we start with a resnet look for the last pooling layer and remove everything from that pooling layer onwards(including pooling layer). What is left is the body which is split into - m[0][:6], m[0][6:] and each of these get lr/10.
Since this is the pretrained part you don’t want to fiddle around too much. Infact when the model is frozen these are not updated. The lr makes a difference only when we unfreeze.
The m[1:] is all the new stuff we add which is the ‘bottom part of the U in unet’ (ie middle_conv) and the decoder( (look at the code we add a little more). These layers are initialized with “random” weights (kaiming init) and that is why have a larger learning rate.
The interesting thing i found is that the way the models are split is not specific to an architecture rather is specific to a family of architectures. (IIRC Jeremy mentioned that he experimented a bit and defined the splits, not sure of the science behind it, if there is one)
You want a higher lr for all the new layers as these are “random” weights.
For the body the layers closest to the input need little tweaking. And the layers after that a little more tweaking but less than the newly added layers.
Please correct me where i’m wrong @Srinivas
This is why irrespective of segmentation or classification we split the same way and assign the lrs the same way when we use pretrained architectures

barnacl · February 14, 2020, 5:38pm

in the segmentation notebook @muellerzr you mention #let's make our vocabulary a part of our DataLoaders, as our loss function needs to deal with the Void label i think you meant accuracy and not loss function.

foobar8675 · February 14, 2020, 5:39pm

@mgloria, @muellerzr i wrote a split_subsets function based on your first suggestion. it’s here. https://colab.research.google.com/drive/1nTetOULwzZzOZ8849QM7ZQTLcCTH3V1V#scrollTo=jpQs3pDoh7y7 the name of the function is SubsetSplitter . if there is any feedback …

muellerzr · February 14, 2020, 5:40pm

Yes I did, I’ll make that adjustment later today.

barnacl · February 14, 2020, 5:46pm

wouldn’t first doing a randperm on all the ids and then cutting be safer. if things were arranged in contiguous groups shuffling it and then cutting it would hopefully pick a good distribution @foobar8675

foobar8675 · February 14, 2020, 5:55pm

it might be and i was thinking of that initially, but zachs suggestion to do it this was (assuming i was understanding him correctly) makes it such that it could used after splitting with a fastai splitter.

i think if there were to be a randperm on all the ids, then it would have to be a replacement for RandomSplitter, GrandparentSplitter, … i think.

muellerzr · February 14, 2020, 6:01pm

Yes @foobar8675’s idea is how I would’ve implemented it Atleast logically. The assumption is we have pre-defined validation and train which could come from using a different splitter first to which then we take a subset of both or all (if we have more than 2). Similar to how Lookahead() can be wrapped around any base optimizer

barnacl · February 14, 2020, 6:05pm

In the ML course this is how jeremy does it.
I see what you mean. i think we should be doing it on items before we pass it in.
I was thinking we should randomly subset(if permissible) the predefined validation and train and then pass it to the splitter. I guess both ways would be the same.

foobar8675 · February 14, 2020, 6:45pm

I’ll look at some examples of https://github.com/fastai/fastai2/blob/master/fastai2/optimizer.py#L268 . it’s not something i’ve explored at all. do you think that would make for a more usable api?

foobar8675 · February 14, 2020, 6:51pm

the way jeremy is in the link you posted makes sense and am glad to change. the code i wrote is influenced by this

github.com

fastai/fastai/blob/master/fastai/data_block.py#L227


    return self.split_by_rand_pct(valid_pct=valid_pct, seed=seed)


def split_by_rand_pct(self, valid_pct:float=0.2, seed:int=None)->'ItemLists':
    "Split the items randomly by putting `valid_pct` in the validation set, optional `seed` can be passed."
    if valid_pct==0.: return self.split_none()
    if seed is not None: np.random.seed(seed)
    rand_idx = np.random.permutation(range_of(self))
    cut = int(valid_pct * len(self))
    return self.split_by_idx(rand_idx[:cut])


def split_subsets(self, train_size:float, valid_size:float, seed=None) -> 'ItemLists':
    "Split the items into train set with size `train_size * n` and valid set with size `valid_size * n`."
    assert 0 < train_size < 1
    assert 0 < valid_size < 1
    assert train_size + valid_size <= 1.
    if seed is not None: np.random.seed(seed)
    n = len(self.items)
    rand_idx = np.random.permutation(range(n))
    train_cut, valid_cut = int(train_size * n), int(valid_size * n)
    return self.split_by_idxs(rand_idx[:train_cut], rand_idx[-valid_cut:])

which is just the fastai1 way of doing it.

(i do want to explore the optimizer way of doing it a bit and see what comes of it. kind of fun to see how all this is wired together)

muellerzr · February 14, 2020, 6:55pm

I think so cause now we can just wrap it around any splitter (what I initially had in mind). LookAhead is the only one that works like that.

barnacl · February 14, 2020, 6:55pm

this is the part i was pointing out that will be safer if you add in your implementation

s.s.o · February 14, 2020, 8:44pm

For image regression what type of explanation mechanisms we can use for DL models? I can think of CAM, Layer visualization roc, AIC but not confusion matrix etc… Any suggestions are welcome…

muellerzr · February 14, 2020, 8:48pm

GradCAM and layer visualization are pretty much it for the most part, focusing on what the attention is. You could also then isolate to what each point is in the output as we’d assume that y1 would go to y1 on our ground truth, etc, so we could see which point is having the highest difficulty

s.s.o · February 14, 2020, 8:50pm

Attention is also good one … By the way I was trying to find not only for point regression but more general to numeric ones…