A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

ilovescience · February 13, 2020, 11:23pm

@muellerzr I have a fastai2 question not necessarily related to you notebooks, but since I am following along with your online lectures, I thought I would post here and ask you. I hope this is fine, and if not I will post as a separate topic under #fastai-users.

Just to play around with fastai2, I thought I would try to train a model for Bengali.AI Kaggle competition. However, I am struggling with creating the DataBlock.

In particular, the competition has 3 separate targets for a single image. So basically, the loss function needs to get three predicted target and three actual targets. I tried a few things. Here’s what I had so far:

data = DataBlock(blocks=(ImageBlock,CategoryBlock),
                 get_x=ColReader(['image_id'],pref=TRAIN,suff='.png'),
                 splitter=IndexSplitter(list(range(fold*len(df)//nfolds,(fold+1)*len(df)//nfolds))),
                 get_y=ColReader(['grapheme_root','vowel_diacritic','consonant_diacritic']),
                 batch_tfms=aug_transforms(do_flip=False,max_warp=0.1,size=sz)
                )

Based on playing around with data.summary(df) (as you had described a little bit in your lecture 3 IIRC), I realize that get_x, splitter, and batch_tfms are most likely correct. However, I think I may not be using the appropriate block and get_y. I tried passing in 3 CategoryBlocks or three ColReaders or both, but none of these work. I have also tried MultiCategoryBlock, but it always tries to do OneHotEncode.

In fastai v1, I think if you just passed all the target columns to the col argument, it would work, but here it doesn’t seem to work that way.

Is there an easy way to do it with the DataBlock API? Otherwise, do I have to create some form of a Pipeline?

muellerzr · February 13, 2020, 11:57pm

So just to be clear, you tried declaring 3 CategoryBlocks and then passed a list of ColReaders as a get_y?

If not, try keeping the 3 category blocks and make a get_y that returns the tuple. Also make sure n_inp = 1 when you’re passing in these multiple y’s too.

ilovescience · February 13, 2020, 11:59pm

Yes, I tried declaring three CategoryBlocks and a list of ColReaders to get_y, one for each target column:

get_y=[ColReader(['grapheme_root']),ColReader(['vowel_diacritic']),ColReader(['consonant_diacritic'])]

This didn’t work. I also just tried this:

data = DataBlock(blocks=(ImageBlock,CategoryBlock, CategoryBlock, CategoryBlock),
                 get_x=ColReader(['image_id'],pref=TRAIN,suff='.png'),
                 splitter=IndexSplitter(list(range(fold*len(df)//nfolds,(fold+1)*len(df)//nfolds))),
                 get_y=ColReader(['grapheme_root','vowel_diacritic','consonant_diacritic']),
                 batch_tfms=aug_transforms(do_flip=False,max_warp=0.1,size=sz),
                 n_inp = 1
                )

This also didn’t work and left an error.

How do I make sure get_y returns a tuple?

muellerzr · February 14, 2020, 12:02am

This will be jumping ahead to next week but take a look at this notebook, it’s an example of setting up getters for object detection (where our getter will first act on an x, then two y’s) https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/06_Object_Detection.ipynb

(If I had time I’d try to make a quick kernel describing the databunch for that task as it’s a great example, I’ll try to if I can)

ilovescience · February 14, 2020, 12:03am

Ah you are saying to use getters, instead of get_x and get_y?

muellerzr · February 14, 2020, 12:04am

Yes, sorry not at my computer right now but that’s what I was trying to say atleast. get_y (I think) assumes only one y. At the very least it’s a healthy assumption based one. getters allows creative freedom to any number of inputs and outputs

ilovescience · February 14, 2020, 12:36am

Thanks! It looks like this works! I didn’t know about the difference between get_x/get_y and getters. It’s definitely helpful that the getters allow for custom inputs and outputs.

LessW2020 · February 14, 2020, 2:18am

docker deployment question - I forked the deployment notebook but the FROM source then doesn’t have things needed per the default requirements.txt? (also it complained about torch being referenced as ‘pytorch’ in requirements.txt.

I changed to the non-slim version but still hit issues. Does anyone have a working requirements.txt and dockerfile I can leverage for 2.0.0.8?
Thanks!

LessW2020 · February 14, 2020, 2:50am

I’m working on exactly that so I can post a guide once I get it working. It’s close but hitting compat issues between what fastai2 wants and what FROM sources have…hard to get it all balanced atm.

Srinivas · February 14, 2020, 3:26am

Don’t know ans to your Q1. Q2 guess that slice(lr) is just being consistent with other cases and maybe lr would work too (I have not tried it - will post update when I do). Q3 - I think the 3 values correspond to the 3 (default?) parameter groups that are used (and I could be wrong - but not to encoder/decoder) but learning rates for diff parameter groups in the encoder itself. (That does beg the Q of how are decoder learning rates set but given that encoder is built up from the decoder, I would guess that decoder lrs are derived from(?) encoder lrs. Maybe some one could confirm this.
(See: https://docs.fast.ai/vision.learner.html towards the end for cut and split_on).

I think that the 3 values of lr are in order used for the first param group and second param group of the backbone (in our case a resnet) and the third lr is for the head. So when model is frozen, I would actually think that lr is for the head. After unfreeze the first couple of lrs would be for the 2 param groups of the decoder backbone and the third for the head.

All of above is based on my poking around and I could be totally wrong in which case I would very much appreciate corrections to my understanding.

barnacl · February 14, 2020, 3:54am

lr works too and it seems like i get the same results for training when the model is frozen.

muellerzr · February 14, 2020, 3:55am

I’m sure you did but did you try fastai’s requirements.txt plus whatever else is needed for your server code?

barnacl · February 14, 2020, 3:55am

i understand that part for the resnet, not sure how it translates to a unet though ? plus what is a the head in this case?

muellerzr · February 14, 2020, 3:56am

Go explore unet_learner’s code. We still use an encoder and a head

muellerzr · February 14, 2020, 3:58am

@barnacl for a hint:

github.com

fastai/fastai2/blob/master/fastai2/vision/learner.py#L193


def unet_learner(dls, arch, loss_func=None, pretrained=True, cut=None, splitter=None, config=None, n_in=3, n_out=None,
                 normalize=True, **kwargs):
    "Build a unet learner from `dls` and `arch`"
    if config is None: config = unet_config()
    meta = model_meta.get(arch, _default_meta)
    body = create_body(arch, n_in, pretrained, ifnone(cut, meta['cut']))
    size = dls.one_batch()[0].shape[-2:]
    if n_out is None: n_out = get_c(dls)
    assert n_out, "`n_out` is not defined, and could not be infered from data, set `dls.c` or pass `n_out`"
    if normalize: _add_norm(dls, meta, pretrained)
    model = models.unet.DynamicUnet(body, n_out, size, **config)
    learn = Learner(dls, model, loss_func=loss_func, splitter=ifnone(splitter, meta['split']), **kwargs)
    if pretrained: learn.freeze()
    return learn


# Cell
@typedispatch
def show_results(x:TensorImage, y, samples, outs, ctxs=None, max_n=10, rows=None, cols=None, figsize=None, **kwargs):
    if ctxs is None: ctxs = get_grid(min(len(samples), max_n), rows=rows, cols=cols, add_vert=1, figsize=figsize)
    ctxs = show_results[object](x, y, samples, outs, ctxs=ctxs, max_n=max_n, **kwargs)
    return ctxs

Srinivas · February 14, 2020, 4:26am

My guess would be few layers to predict the class of each pixel and so a classifier that outputs the prob values for each of the max number of objects in the images or 32 classes (if I recall correctly)

barnacl · February 14, 2020, 4:39am

this the bottom of the unet (where encoder connects to the decoder)
we can see the conv is not trainable (part of the encoder)
not sure why we have two batchnorms
the conv is trainable - part of the decoder
Right before we reach 1 we have 512 x 12 x 15 (so it is 512 feature maps)
still figuring out what the head is, will add to this if i have more answers (though i keep having more questions )

muellerzr · February 14, 2020, 4:43am

@barnacl everything after layers I believe:

github.com

fastai/fastai2/blob/master/fastai2/vision/models/unet.py#L68


imsize = img_size
sizes = model_sizes(encoder, size=imsize)
sz_chg_idxs = list(reversed(_get_sz_change_idxs(sizes)))
self.sfs = hook_outputs([encoder[i] for i in sz_chg_idxs], detach=False)
x = dummy_eval(encoder, imsize).detach()


ni = sizes[-1][1]
middle_conv = nn.Sequential(ConvLayer(ni, ni*2, act_cls=act_cls, norm_type=norm_type, **kwargs),
                            ConvLayer(ni*2, ni, act_cls=act_cls, norm_type=norm_type, **kwargs)).eval()
x = middle_conv(x)
layers = [encoder, BatchNorm(ni), nn.ReLU(), middle_conv]


for i,idx in enumerate(sz_chg_idxs):
    not_final = i!=len(sz_chg_idxs)-1
    up_in_c, x_in_c = int(x.shape[1]), int(sizes[idx][1])
    do_blur = blur and (not_final or blur_final)
    sa = self_attention and (i==len(sz_chg_idxs)-3)
    unet_block = UnetBlock(up_in_c, x_in_c, self.sfs[i], final_div=not_final, blur=do_blur, self_attention=sa,
                           act_cls=act_cls, init=init, norm_type=norm_type, **kwargs).eval()
    layers.append(unet_block)
    x = unet_block(x)

You can see we get some Unet blocks followed by at the very end a ConvLayer (since we output a ‘picture’ (our masks) instead of a class (like a Linear layer would)

barnacl · February 14, 2020, 4:47am

thanks @muellerzr that helps

kshitijpatil09 · February 14, 2020, 6:57am

Is there any reference explaining PixelShuffle and ICNR. Also I’m not able to understand blur parameter of unet_config. I’m aware it adds ReplicationPad and found that with blur=True, generated images are smoothed as oppose to jagged ones with blur=False.