A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

Most any layer that looks different or odd is in the layers.py file. If you look you can see PixelShuffle is just nn.PixelShuffle (plus a few bits)

True. But I want to know theory behind it and why itā€™s used extensively by fastai. I believe Jeremy once said heā€™ll explain the same in the second part of course but I didnā€™t find any reference to that. Looking for article/paper expounding on this topic.

@kshitijpatil09 Fastai Unet

1 Like

For pixel shuffle specifically the Torch code itself references this paper:

1 Like

for pytorch tutorials, i personally liked this one. it is more an intro https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

1 Like

Found a few answers after exploring the code as @muellerzr suggested.
If you pass just slice(end) then the last group's learning rate is end, and all the other groups are end/10. here (look under lr_range). What role slice plays can be seen here.
So now we get three lrs ie [lr/10, lr/10, lr]. how is the network split so we can apply these different lrs?
This happens in the Learner it takes a parameter called splitter. splitter are a bunch of functions that are defined based on the architecture family.
As we are dealing with a resnet we have - def _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)
So we start with a resnet look for the last pooling layer and remove everything from that pooling layer onwards(including pooling layer). What is left is the body which is split into - m[0][:6], m[0][6:] and each of these get lr/10.
Since this is the pretrained part you donā€™t want to fiddle around too much. Infact when the model is frozen these are not updated. The lr makes a difference only when we unfreeze.
The m[1:] is all the new stuff we add which is the ā€˜bottom part of the U in unetā€™ (ie middle_conv) and the decoder( (look at the code we add a little more). These layers are initialized with ā€œrandomā€ weights (kaiming init) and that is why have a larger learning rate.
The interesting thing i found is that the way the models are split is not specific to an architecture rather is specific to a family of architectures. (IIRC Jeremy mentioned that he experimented a bit and defined the splits, not sure of the science behind it, if there is one)
You want a higher lr for all the new layers as these are ā€œrandomā€ weights.
For the body the layers closest to the input need little tweaking. And the layers after that a little more tweaking but less than the newly added layers.
Please correct me where iā€™m wrong :slight_smile: @Srinivas
This is why irrespective of segmentation or classification we split the same way and assign the lrs the same way when we use pretrained architectures

1 Like

in the segmentation notebook @muellerzr you mention #let's make our vocabulary a part of our DataLoaders, as our loss function needs to deal with the Void label i think you meant accuracy and not loss function.

@mgloria, @muellerzr i wrote a split_subsets function based on your first suggestion. itā€™s here. https://colab.research.google.com/drive/1nTetOULwzZzOZ8849QM7ZQTLcCTH3V1V#scrollTo=jpQs3pDoh7y7 the name of the function is SubsetSplitter . if there is any feedback ā€¦

1 Like

Yes I did, Iā€™ll make that adjustment later today.

1 Like

wouldnā€™t first doing a randperm on all the ids and then cutting be safer. if things were arranged in contiguous groups shuffling it and then cutting it would hopefully pick a good distribution @foobar8675

it might be and i was thinking of that initially, but zachs suggestion to do it this was (assuming i was understanding him correctly) makes it such that it could used after splitting with a fastai splitter.

i think if there were to be a randperm on all the ids, then it would have to be a replacement for RandomSplitter, GrandparentSplitter, ā€¦ i think.

Yes @foobar8675ā€™s idea is how I wouldā€™ve implemented it Atleast logically. The assumption is we have pre-defined validation and train which could come from using a different splitter first to which then we take a subset of both or all (if we have more than 2). Similar to how Lookahead() can be wrapped around any base optimizer

2 Likes

In the ML course this is how jeremy does it.
I see what you mean. i think we should be doing it on items before we pass it in.
I was thinking we should randomly subset(if permissible) the predefined validation and train and then pass it to the splitter. I guess both ways would be the same.

Iā€™ll look at some examples of https://github.com/fastai/fastai2/blob/master/fastai2/optimizer.py#L268 . itā€™s not something iā€™ve explored at all. do you think that would make for a more usable api?

the way jeremy is in the link you posted makes sense and am glad to change. the code i wrote is influenced by this


which is just the fastai1 way of doing it.

(i do want to explore the optimizer way of doing it a bit and see what comes of it. kind of fun to see how all this is wired together)

I think so cause now we can just wrap it around any splitter (what I initially had in mind). LookAhead is the only one that works like that.

this is the part i was pointing out that will be safer if you add in your implementation

1 Like

For image regression what type of explanation mechanisms we can use for DL models? I can think of CAM, Layer visualization roc, AIC but not confusion matrix etcā€¦ Any suggestions are welcomeā€¦

GradCAM and layer visualization are pretty much it for the most part, focusing on what the attention is. You could also then isolate to what each point is in the output as weā€™d assume that y1 would go to y1 on our ground truth, etc, so we could see which point is having the highest difficulty

1 Like

Attention is also good one ā€¦ By the way I was trying to find not only for point regression but more general to numeric onesā€¦

1 Like