@barnacl everything after layers I believe:
You can see we get some Unet blocks followed by at the very end a ConvLayer (since we output a ‘picture’ (our masks) instead of a class (like a Linear layer would)
@barnacl everything after layers I believe:
You can see we get some Unet blocks followed by at the very end a ConvLayer (since we output a ‘picture’ (our masks) instead of a class (like a Linear layer would)
Is there any reference explaining PixelShuffle
and ICNR
. Also I’m not able to understand blur
parameter of unet_config
. I’m aware it adds ReplicationPad
and found that with blur=True
, generated images are smoothed as oppose to jagged ones with blur=False
.
Most any layer that looks different or odd is in the layers.py file. If you look you can see PixelShuffle is just nn.PixelShuffle (plus a few bits)
True. But I want to know theory behind it and why it’s used extensively by fastai. I believe Jeremy once said he’ll explain the same in the second part of course but I didn’t find any reference to that. Looking for article/paper expounding on this topic.
For pixel shuffle specifically the Torch code itself references this paper:
for pytorch tutorials, i personally liked this one. it is more an intro https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
Found a few answers after exploring the code as @muellerzr suggested.
If you pass just slice(end) then the last group's learning rate is end, and all the other groups are end/10.
here (look under lr_range
). What role slice
plays can be seen here.
So now we get three lrs ie [lr/10, lr/10, lr]. how is the network split so we can apply these different lrs?
This happens in the Learner it takes a parameter called splitter
. splitter
are a bunch of functions that are defined based on the architecture family.
As we are dealing with a resnet we have - def _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)
So we start with a resnet look for the last pooling layer and remove everything from that pooling layer onwards(including pooling layer). What is left is the body which is split into - m[0][:6], m[0][6:] and each of these get lr/10.
Since this is the pretrained part you don’t want to fiddle around too much. Infact when the model is frozen these are not updated. The lr makes a difference only when we unfreeze.
The m[1:]
is all the new stuff we add which is the ‘bottom part of the U in unet’ (ie middle_conv) and the decoder( (look at the code we add a little more). These layers are initialized with “random” weights (kaiming init) and that is why have a larger learning rate.
The interesting thing i found is that the way the models are split is not specific to an architecture rather is specific to a family of architectures. (IIRC Jeremy mentioned that he experimented a bit and defined the splits, not sure of the science behind it, if there is one)
You want a higher lr for all the new layers as these are “random” weights.
For the body the layers closest to the input need little tweaking. And the layers after that a little more tweaking but less than the newly added layers.
Please correct me where i’m wrong @Srinivas
This is why irrespective of segmentation or classification we split the same way and assign the lrs the same way when we use pretrained architectures
in the segmentation notebook @muellerzr you mention #let's make our vocabulary a part of our DataLoaders, as our loss function needs to deal with the Void label
i think you meant accuracy and not loss function.
@mgloria, @muellerzr i wrote a split_subsets function based on your first suggestion. it’s here. https://colab.research.google.com/drive/1nTetOULwzZzOZ8849QM7ZQTLcCTH3V1V#scrollTo=jpQs3pDoh7y7 the name of the function is SubsetSplitter
. if there is any feedback …
Yes I did, I’ll make that adjustment later today.
wouldn’t first doing a randperm on all the ids and then cutting be safer. if things were arranged in contiguous groups shuffling it and then cutting it would hopefully pick a good distribution @foobar8675
it might be and i was thinking of that initially, but zachs suggestion to do it this was (assuming i was understanding him correctly) makes it such that it could used after splitting with a fastai splitter.
i think if there were to be a randperm on all the ids, then it would have to be a replacement for RandomSplitter, GrandparentSplitter, … i think.
Yes @foobar8675’s idea is how I would’ve implemented it Atleast logically. The assumption is we have pre-defined validation and train which could come from using a different splitter first to which then we take a subset of both or all (if we have more than 2). Similar to how Lookahead() can be wrapped around any base optimizer
In the ML course this is how jeremy does it.
I see what you mean. i think we should be doing it on items
before we pass it in.
I was thinking we should randomly subset(if permissible) the predefined validation and train and then pass it to the splitter. I guess both ways would be the same.
I’ll look at some examples of https://github.com/fastai/fastai2/blob/master/fastai2/optimizer.py#L268 . it’s not something i’ve explored at all. do you think that would make for a more usable api?
the way jeremy is in the link you posted makes sense and am glad to change. the code i wrote is influenced by this
which is just the fastai1 way of doing it.
(i do want to explore the optimizer way of doing it a bit and see what comes of it. kind of fun to see how all this is wired together)
I think so cause now we can just wrap it around any splitter (what I initially had in mind). LookAhead is the only one that works like that.
this is the part i was pointing out that will be safer if you add in your implementation