A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

muellerzr · February 14, 2020, 4:43am

@barnacl everything after layers I believe:

fastai/fastai2/blob/master/fastai2/vision/models/unet.py#L68


imsize = img_size
sizes = model_sizes(encoder, size=imsize)
sz_chg_idxs = list(reversed(_get_sz_change_idxs(sizes)))
self.sfs = hook_outputs([encoder[i] for i in sz_chg_idxs], detach=False)
x = dummy_eval(encoder, imsize).detach()


ni = sizes[-1][1]
middle_conv = nn.Sequential(ConvLayer(ni, ni*2, act_cls=act_cls, norm_type=norm_type, **kwargs),
                            ConvLayer(ni*2, ni, act_cls=act_cls, norm_type=norm_type, **kwargs)).eval()
x = middle_conv(x)
layers = [encoder, BatchNorm(ni), nn.ReLU(), middle_conv]


for i,idx in enumerate(sz_chg_idxs):
    not_final = i!=len(sz_chg_idxs)-1
    up_in_c, x_in_c = int(x.shape[1]), int(sizes[idx][1])
    do_blur = blur and (not_final or blur_final)
    sa = self_attention and (i==len(sz_chg_idxs)-3)
    unet_block = UnetBlock(up_in_c, x_in_c, self.sfs[i], final_div=not_final, blur=do_blur, self_attention=sa,
                           act_cls=act_cls, init=init, norm_type=norm_type, **kwargs).eval()
    layers.append(unet_block)
    x = unet_block(x)

You can see we get some Unet blocks followed by at the very end a ConvLayer (since we output a ‘picture’ (our masks) instead of a class (like a Linear layer would)

barnacl · February 14, 2020, 4:47am

thanks @muellerzr that helps

kshitijpatil09 · February 14, 2020, 6:57am

Is there any reference explaining PixelShuffle and ICNR. Also I’m not able to understand blur parameter of unet_config. I’m aware it adds ReplicationPad and found that with blur=True, generated images are smoothed as oppose to jagged ones with blur=False.

muellerzr · February 14, 2020, 7:01am

Most any layer that looks different or odd is in the layers.py file. If you look you can see PixelShuffle is just nn.PixelShuffle (plus a few bits)

kshitijpatil09 · February 14, 2020, 7:49am

True. But I want to know theory behind it and why it’s used extensively by fastai. I believe Jeremy once said he’ll explain the same in the second part of course but I didn’t find any reference to that. Looking for article/paper expounding on this topic.

muellerzr · February 14, 2020, 2:53pm

@kshitijpatil09 Fastai Unet

Srinivas · February 14, 2020, 4:37pm

For pixel shuffle specifically the Torch code itself references this paper:

foobar8675 · February 14, 2020, 5:31pm

for pytorch tutorials, i personally liked this one. it is more an intro https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

barnacl · February 14, 2020, 5:32pm

Found a few answers after exploring the code as @muellerzr suggested.
If you pass just slice(end) then the last group's learning rate is end, and all the other groups are end/10. here (look under lr_range). What role slice plays can be seen here.
So now we get three lrs ie [lr/10, lr/10, lr]. how is the network split so we can apply these different lrs?
This happens in the Learner it takes a parameter called splitter. splitter are a bunch of functions that are defined based on the architecture family.
As we are dealing with a resnet we have - def _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)
So we start with a resnet look for the last pooling layer and remove everything from that pooling layer onwards(including pooling layer). What is left is the body which is split into - m[0][:6], m[0][6:] and each of these get lr/10.
Since this is the pretrained part you don’t want to fiddle around too much. Infact when the model is frozen these are not updated. The lr makes a difference only when we unfreeze.
The m[1:] is all the new stuff we add which is the ‘bottom part of the U in unet’ (ie middle_conv) and the decoder( (look at the code we add a little more). These layers are initialized with “random” weights (kaiming init) and that is why have a larger learning rate.
The interesting thing i found is that the way the models are split is not specific to an architecture rather is specific to a family of architectures. (IIRC Jeremy mentioned that he experimented a bit and defined the splits, not sure of the science behind it, if there is one)
You want a higher lr for all the new layers as these are “random” weights.
For the body the layers closest to the input need little tweaking. And the layers after that a little more tweaking but less than the newly added layers.
Please correct me where i’m wrong @Srinivas
This is why irrespective of segmentation or classification we split the same way and assign the lrs the same way when we use pretrained architectures

barnacl · February 14, 2020, 5:38pm

in the segmentation notebook @muellerzr you mention #let's make our vocabulary a part of our DataLoaders, as our loss function needs to deal with the Void label i think you meant accuracy and not loss function.

foobar8675 · February 14, 2020, 5:39pm

@mgloria, @muellerzr i wrote a split_subsets function based on your first suggestion. it’s here. https://colab.research.google.com/drive/1nTetOULwzZzOZ8849QM7ZQTLcCTH3V1V#scrollTo=jpQs3pDoh7y7 the name of the function is SubsetSplitter . if there is any feedback …

muellerzr · February 14, 2020, 5:40pm

Yes I did, I’ll make that adjustment later today.

barnacl · February 14, 2020, 5:46pm

wouldn’t first doing a randperm on all the ids and then cutting be safer. if things were arranged in contiguous groups shuffling it and then cutting it would hopefully pick a good distribution @foobar8675

foobar8675 · February 14, 2020, 5:55pm

it might be and i was thinking of that initially, but zachs suggestion to do it this was (assuming i was understanding him correctly) makes it such that it could used after splitting with a fastai splitter.

i think if there were to be a randperm on all the ids, then it would have to be a replacement for RandomSplitter, GrandparentSplitter, … i think.

muellerzr · February 14, 2020, 6:01pm

Yes @foobar8675’s idea is how I would’ve implemented it Atleast logically. The assumption is we have pre-defined validation and train which could come from using a different splitter first to which then we take a subset of both or all (if we have more than 2). Similar to how Lookahead() can be wrapped around any base optimizer

barnacl · February 14, 2020, 6:05pm

In the ML course this is how jeremy does it.
I see what you mean. i think we should be doing it on items before we pass it in.
I was thinking we should randomly subset(if permissible) the predefined validation and train and then pass it to the splitter. I guess both ways would be the same.

foobar8675 · February 14, 2020, 6:45pm

I’ll look at some examples of https://github.com/fastai/fastai2/blob/master/fastai2/optimizer.py#L268 . it’s not something i’ve explored at all. do you think that would make for a more usable api?

foobar8675 · February 14, 2020, 6:51pm

the way jeremy is in the link you posted makes sense and am glad to change. the code i wrote is influenced by this

github.com

fastai/fastai/blob/master/fastai/data_block.py#L227


    return self.split_by_rand_pct(valid_pct=valid_pct, seed=seed)


def split_by_rand_pct(self, valid_pct:float=0.2, seed:int=None)->'ItemLists':
    "Split the items randomly by putting `valid_pct` in the validation set, optional `seed` can be passed."
    if valid_pct==0.: return self.split_none()
    if seed is not None: np.random.seed(seed)
    rand_idx = np.random.permutation(range_of(self))
    cut = int(valid_pct * len(self))
    return self.split_by_idx(rand_idx[:cut])


def split_subsets(self, train_size:float, valid_size:float, seed=None) -> 'ItemLists':
    "Split the items into train set with size `train_size * n` and valid set with size `valid_size * n`."
    assert 0 < train_size < 1
    assert 0 < valid_size < 1
    assert train_size + valid_size <= 1.
    if seed is not None: np.random.seed(seed)
    n = len(self.items)
    rand_idx = np.random.permutation(range(n))
    train_cut, valid_cut = int(train_size * n), int(valid_size * n)
    return self.split_by_idxs(rand_idx[:train_cut], rand_idx[-valid_cut:])

which is just the fastai1 way of doing it.

(i do want to explore the optimizer way of doing it a bit and see what comes of it. kind of fun to see how all this is wired together)

muellerzr · February 14, 2020, 6:55pm

I think so cause now we can just wrap it around any splitter (what I initially had in mind). LookAhead is the only one that works like that.

barnacl · February 14, 2020, 6:55pm

this is the part i was pointing out that will be safer if you add in your implementation