A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

@muellerzr I have a fastai2 question not necessarily related to you notebooks, but since I am following along with your online lectures, I thought I would post here and ask you. I hope this is fine, and if not I will post as a separate topic under #fastai-users.

Just to play around with fastai2, I thought I would try to train a model for Bengali.AI Kaggle competition. However, I am struggling with creating the DataBlock.

In particular, the competition has 3 separate targets for a single image. So basically, the loss function needs to get three predicted target and three actual targets. I tried a few things. Hereā€™s what I had so far:

data = DataBlock(blocks=(ImageBlock,CategoryBlock),
                 get_x=ColReader(['image_id'],pref=TRAIN,suff='.png'),
                 splitter=IndexSplitter(list(range(fold*len(df)//nfolds,(fold+1)*len(df)//nfolds))),
                 get_y=ColReader(['grapheme_root','vowel_diacritic','consonant_diacritic']),
                 batch_tfms=aug_transforms(do_flip=False,max_warp=0.1,size=sz)
                )

Based on playing around with data.summary(df) (as you had described a little bit in your lecture 3 IIRC), I realize that get_x, splitter, and batch_tfms are most likely correct. However, I think I may not be using the appropriate block and get_y. I tried passing in 3 CategoryBlocks or three ColReaders or both, but none of these work. I have also tried MultiCategoryBlock, but it always tries to do OneHotEncode.

In fastai v1, I think if you just passed all the target columns to the col argument, it would work, but here it doesnā€™t seem to work that way.

Is there an easy way to do it with the DataBlock API? Otherwise, do I have to create some form of a Pipeline?

So just to be clear, you tried declaring 3 CategoryBlocks and then passed a list of ColReaders as a get_y?

If not, try keeping the 3 category blocks and make a get_y that returns the tuple. Also make sure n_inp = 1 when youā€™re passing in these multiple yā€™s too.

Yes, I tried declaring three CategoryBlocks and a list of ColReaders to get_y, one for each target column:

get_y=[ColReader(['grapheme_root']),ColReader(['vowel_diacritic']),ColReader(['consonant_diacritic'])]

This didnā€™t work. I also just tried this:

data = DataBlock(blocks=(ImageBlock,CategoryBlock, CategoryBlock, CategoryBlock),
                 get_x=ColReader(['image_id'],pref=TRAIN,suff='.png'),
                 splitter=IndexSplitter(list(range(fold*len(df)//nfolds,(fold+1)*len(df)//nfolds))),
                 get_y=ColReader(['grapheme_root','vowel_diacritic','consonant_diacritic']),
                 batch_tfms=aug_transforms(do_flip=False,max_warp=0.1,size=sz),
                 n_inp = 1
                )

This also didnā€™t work and left an error.

How do I make sure get_y returns a tuple?

This will be jumping ahead to next week but take a look at this notebook, itā€™s an example of setting up getters for object detection (where our getter will first act on an x, then two yā€™s) https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/06_Object_Detection.ipynb

(If I had time Iā€™d try to make a quick kernel describing the databunch for that task as itā€™s a great example, Iā€™ll try to if I can)

1 Like

Ah you are saying to use getters, instead of get_x and get_y?

Yes, sorry not at my computer right now but thatā€™s what I was trying to say atleast. get_y (I think) assumes only one y. At the very least itā€™s a healthy assumption based one. getters allows creative freedom to any number of inputs and outputs

1 Like

Thanks! It looks like this works! I didnā€™t know about the difference between get_x/get_y and getters. Itā€™s definitely helpful that the getters allow for custom inputs and outputs.

2 Likes

docker deployment question - I forked the deployment notebook but the FROM source then doesnā€™t have things needed per the default requirements.txt? (also it complained about torch being referenced as ā€˜pytorchā€™ in requirements.txt.

I changed to the non-slim version but still hit issues. Does anyone have a working requirements.txt and dockerfile I can leverage for 2.0.0.8?
Thanks!

Iā€™m working on exactly that so I can post a guide once I get it working. Itā€™s close but hitting compat issues between what fastai2 wants and what FROM sources haveā€¦hard to get it all balanced atm.

1 Like

Donā€™t know ans to your Q1. Q2 guess that slice(lr) is just being consistent with other cases and maybe lr would work too (I have not tried it - will post update when I do). Q3 - I think the 3 values correspond to the 3 (default?) parameter groups that are used (and I could be wrong - but not to encoder/decoder) but learning rates for diff parameter groups in the encoder itself. (That does beg the Q of how are decoder learning rates set but given that encoder is built up from the decoder, I would guess that decoder lrs are derived from(?) encoder lrs. Maybe some one could confirm this.
(See: https://docs.fast.ai/vision.learner.html towards the end for cut and split_on).

I think that the 3 values of lr are in order used for the first param group and second param group of the backbone (in our case a resnet) and the third lr is for the head. So when model is frozen, I would actually think that lr is for the head. After unfreeze the first couple of lrs would be for the 2 param groups of the decoder backbone and the third for the head.

All of above is based on my poking around and I could be totally wrong in which case I would very much appreciate corrections to my understanding.

lr works too and it seems like i get the same results for training when the model is frozen.

Iā€™m sure you did but did you try fastaiā€™s requirements.txt plus whatever else is needed for your server code?

i understand that part for the resnet, not sure how it translates to a unet though ? plus what is a the head in this case?

Go explore unet_learnerā€™s code. We still use an encoder and a head :wink:

@barnacl for a hint:

1 Like

My guess would be few layers to predict the class of each pixel and so a classifier that outputs the prob values for each of the max number of objects in the images or 32 classes (if I recall correctly)

  1. this the bottom of the unet (where encoder connects to the decoder)
  2. we can see the conv is not trainable (part of the encoder)
  3. not sure why we have two batchnorms
  4. the conv is trainable - part of the decoder
    Right before we reach 1 we have 512 x 12 x 15 (so it is 512 feature maps)
    still figuring out what the head is, will add to this if i have more answers (though i keep having more questions :grimacing:)
1 Like

@barnacl everything after layers I believe:

You can see we get some Unet blocks followed by at the very end a ConvLayer (since we output a ā€˜pictureā€™ (our masks) instead of a class (like a Linear layer would)

1 Like

thanks @muellerzr that helps :slight_smile:

1 Like

Is there any reference explaining PixelShuffle and ICNR. Also Iā€™m not able to understand blur parameter of unet_config. Iā€™m aware it adds ReplicationPad and found that with blur=True, generated images are smoothed as oppose to jagged ones with blur=False.