A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

@barnacl everything after layers I believe:

You can see we get some Unet blocks followed by at the very end a ConvLayer (since we output a ‘picture’ (our masks) instead of a class (like a Linear layer would)

1 Like

thanks @muellerzr that helps :slight_smile:

1 Like

Is there any reference explaining PixelShuffle and ICNR. Also I’m not able to understand blur parameter of unet_config. I’m aware it adds ReplicationPad and found that with blur=True, generated images are smoothed as oppose to jagged ones with blur=False.

Most any layer that looks different or odd is in the layers.py file. If you look you can see PixelShuffle is just nn.PixelShuffle (plus a few bits)

True. But I want to know theory behind it and why it’s used extensively by fastai. I believe Jeremy once said he’ll explain the same in the second part of course but I didn’t find any reference to that. Looking for article/paper expounding on this topic.

@kshitijpatil09 Fastai Unet

1 Like

For pixel shuffle specifically the Torch code itself references this paper:

1 Like

for pytorch tutorials, i personally liked this one. it is more an intro https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

1 Like

Found a few answers after exploring the code as @muellerzr suggested.
If you pass just slice(end) then the last group's learning rate is end, and all the other groups are end/10. here (look under lr_range). What role slice plays can be seen here.
So now we get three lrs ie [lr/10, lr/10, lr]. how is the network split so we can apply these different lrs?
This happens in the Learner it takes a parameter called splitter. splitter are a bunch of functions that are defined based on the architecture family.
As we are dealing with a resnet we have - def _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)
So we start with a resnet look for the last pooling layer and remove everything from that pooling layer onwards(including pooling layer). What is left is the body which is split into - m[0][:6], m[0][6:] and each of these get lr/10.
Since this is the pretrained part you don’t want to fiddle around too much. Infact when the model is frozen these are not updated. The lr makes a difference only when we unfreeze.
The m[1:] is all the new stuff we add which is the ‘bottom part of the U in unet’ (ie middle_conv) and the decoder( (look at the code we add a little more). These layers are initialized with “random” weights (kaiming init) and that is why have a larger learning rate.
The interesting thing i found is that the way the models are split is not specific to an architecture rather is specific to a family of architectures. (IIRC Jeremy mentioned that he experimented a bit and defined the splits, not sure of the science behind it, if there is one)
You want a higher lr for all the new layers as these are “random” weights.
For the body the layers closest to the input need little tweaking. And the layers after that a little more tweaking but less than the newly added layers.
Please correct me where i’m wrong :slight_smile: @Srinivas
This is why irrespective of segmentation or classification we split the same way and assign the lrs the same way when we use pretrained architectures

1 Like

in the segmentation notebook @muellerzr you mention #let's make our vocabulary a part of our DataLoaders, as our loss function needs to deal with the Void label i think you meant accuracy and not loss function.

@mgloria, @muellerzr i wrote a split_subsets function based on your first suggestion. it’s here. https://colab.research.google.com/drive/1nTetOULwzZzOZ8849QM7ZQTLcCTH3V1V#scrollTo=jpQs3pDoh7y7 the name of the function is SubsetSplitter . if there is any feedback …

1 Like

Yes I did, I’ll make that adjustment later today.

1 Like

wouldn’t first doing a randperm on all the ids and then cutting be safer. if things were arranged in contiguous groups shuffling it and then cutting it would hopefully pick a good distribution @foobar8675

it might be and i was thinking of that initially, but zachs suggestion to do it this was (assuming i was understanding him correctly) makes it such that it could used after splitting with a fastai splitter.

i think if there were to be a randperm on all the ids, then it would have to be a replacement for RandomSplitter, GrandparentSplitter, … i think.

Yes @foobar8675’s idea is how I would’ve implemented it Atleast logically. The assumption is we have pre-defined validation and train which could come from using a different splitter first to which then we take a subset of both or all (if we have more than 2). Similar to how Lookahead() can be wrapped around any base optimizer

2 Likes

In the ML course this is how jeremy does it.
I see what you mean. i think we should be doing it on items before we pass it in.
I was thinking we should randomly subset(if permissible) the predefined validation and train and then pass it to the splitter. I guess both ways would be the same.

I’ll look at some examples of https://github.com/fastai/fastai2/blob/master/fastai2/optimizer.py#L268 . it’s not something i’ve explored at all. do you think that would make for a more usable api?

the way jeremy is in the link you posted makes sense and am glad to change. the code i wrote is influenced by this


which is just the fastai1 way of doing it.

(i do want to explore the optimizer way of doing it a bit and see what comes of it. kind of fun to see how all this is wired together)

I think so cause now we can just wrap it around any splitter (what I initially had in mind). LookAhead is the only one that works like that.

this is the part i was pointing out that will be safer if you add in your implementation

1 Like