A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

vijayabhaskar · February 22, 2020, 4:34pm

I think converting the gray scale to 3-channel would be better as you’re not throwing away the very first layer, having said that CNN is a powerful model, So I also think it doesn’t matter much which method one uses. Regarding the transfer learning, do you think using transfer learning from huge handdrawn datasets like quickdraw would give better performance? Has anyone done that?

muellerzr · February 22, 2020, 4:35pm

No idea. Possibly.

sut · February 22, 2020, 6:59pm

Zach,

Is your style transfer example based on Gatys’s 2015 “A Neural Algorithm of Artistic Style” (what Jeremy describes as “the original way to do it” in his video)?

muellerzr · February 22, 2020, 7:06pm

Yes it is! We replicate that technique (which is also what Jeremy did in part 2 a few years back).

You can see the parallels if you explore the source code here:

mgloria · February 22, 2020, 8:50pm

Thanks a lot @muellerzr. Awesome reply!!

So in this case we are doing dls.c = dls.train.after_item.c because since we are not using cnn_learner but just Learner we need to explicitly specify it. Correct?

Thanks a lot for the head code link!! it really help me a lot. I see now how the number of channels gets multiplied by 2. What does *body.children() mean?
It manages to return the number of channels (i.e. number of filters - nf) but I do not understand the syntax.

Finally I have a petition for the last image lecture. Could you walk us through the Datablock API code. I mean the output of DataBlock??

The reason behind is I usually code with an IDE, not with colab. In an IDE I know how to set stop points and follow the execution line by line until I understand it all. Here I do not have this deep understanding of the code which makes mistakes difficult to debug in more complex statements like

I believe this is what I am really missing…

muellerzr · February 22, 2020, 8:57pm

Not quite. If we were using cnn_learner, we would still need to specify it. And since we pass in dls.c to our create_head, we’re mimicing how cnn_learner works.

I think this resource may help you:

https://spandan-madan.github.io/A-Collection-of-important-tasks-in-pytorch/

children() allows us to see the submodules inside of the body. num_features_model let’s us take what the output size of the last layer was (we don’t pass an input because it figures out what works for us).

Sure, I can try to do that.

mgloria · February 22, 2020, 8:59pm

@vijayabhaskar I took a look at your question, imagenet_stats are simply a tuple ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])these being the means and std for the 3 channels, as pointed out by @muellerzr.

However, to be fully honest I do not see how they are applied. Here is the code of Normalize.from_stats(*imagenet_stats)], seems to me the actual normalization happens in encodes but I do not see this function being called…

muellerzr · February 22, 2020, 9:01pm

Remember, our encodes happen when we go down our transformation pipeline. So it happens at the tail end (see how order = 99). It’s the last thing that happens. And our TensorImage is one giant matrix, so we can subtract and divide all it’s values by our passed in mean and standard deviation

mgloria · February 22, 2020, 9:08pm

How do you know about the order? Where can I find it for other cases? I looked into the code of Rotate() Rotate??but I do not see in this case the order being specified. I thought this batch_tfms were executed in the oder they were specified in batch_tfms, e.g.

batch_tfms = [IntToFloatTensor(), *aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]

and I struggle to find where Normalize.encode() is being called

barnacl · February 22, 2020, 9:08pm

you don’t have to call encodes explicitly. it is similar to how we never call forward from a nn.module

muellerzr · February 22, 2020, 9:10pm

If you go look at the parent (AffineCoordTfm) you’ll see it too has an order, which is 30.

No, we have a hierarchy now, which gives us freedom to define when and how we want particular transforms to be done.

It’s being called when we go down the pipeline. Go look at how our dblock.summary() works. You can see the magic happening (especially when I talked about when we attempt to do it without a datablock)

muellerzr · February 22, 2020, 10:04pm

By the way, with dblock.summary(), the bug where Normalize would break everything was fixed (should come out in the next version, for right now it’s on dev)

muellerzr · February 22, 2020, 11:06pm

By the way, back to our discussion of the getters, this also works for our bounding box example:

getters = [
  noop,
  lambda o: img2bbox[o.name][0],
  lambda o: img2bbox[o.name][1]
]

block = DataBlock(blocks=...,
                  splitter = ...,
                  get_items = get_image_files
                  getters=getters)

Essentially we already define get_items and so we can work off of it’s result for everything that’s not the first

foobar8675 · February 22, 2020, 11:35pm

i was uncertain about something in 06_Keypoint_Regression … with this code

item_tfms = [Resize(448, method='squish')]
batch_tfms = [Flip(), Rotate(), Zoom(), Warp(), ClampBatch()]

since the batch transforms happen after the item ones, isn’t it possilbe that the Zoom batch transform could make one of the tensor points fall outside the dimensions of the image?

foobar8675 · February 22, 2020, 11:45pm

one other thing i’m not sure about, this line in 06_Keypoint_Regression

dblock.summary('')

prints out

Applying batch_tfms to the batch built
  Pipeline: ClampBatch -> IntToFloatTensor -> AffineCoordTfm
    starting from
      (TensorImage of size 4x3x448x448, TensorPoint of size 4x9x2)
    applying ClampBatch gives
      (TensorImage of size 4x3x448x448, TensorPoint of size 4x9x2)
    applying IntToFloatTensor gives
      (TensorImage of size 4x3x448x448, TensorPoint of size 4x9x2)
    applying AffineCoordTfm gives
      (TensorImage of size 4x3x448x448, TensorPoint of size 4x9x2)

I would have expected to also see mention of the other batch transforms, … flip, rotate, zoom, warp, but the only transform i see is the ClampBatch.

muellerzr · February 22, 2020, 11:47pm

Like I said to the above comment, those are all AffineCoordTfms (this is why we can include them)

If we look at the Zoom transform (which should probably be zoom):

github.com

fastai/fastai2/blob/master/fastai2/vision/augment.py#L554


      
              "Apply a random dihedral transformation to a batch of images with a probability `p`"
              def __init__(self, p=0.5, draw=None, size=None, mode='bilinear', pad_mode=PadMode.Reflection, align_corners=None, batch=False):
                  f = partial(dihedral_mat, p=p, draw=draw, batch=batch)
                  super().__init__(aff_fs=f, size=size, mode=mode, pad_mode=pad_mode, align_corners=align_corners)
          
          # Cell
          class DeterministicDihedral(Dihedral):
              def __init__(self, size=None, mode='bilinear', pad_mode=PadMode.Reflection, align_corners=None):
                  "Flip the batch every other call"
                  super().__init__(p=1., draw=DeterministicDraw(list(range(8))), pad_mode=pad_mode, align_corners=align_corners)
          
          # Cell
          def rotate_mat(x, max_deg=10, p=0.5, draw=None, batch=False):
              "Return a random rotation matrix with `max_deg` and `p`"
              def _def_draw(x):   return x.new(x.size(0)).uniform_(-max_deg, max_deg)
              def _def_draw_b(x): return x.new_zeros(x.size(0)) + random.uniform(-max_deg, max_deg)
              thetas = _draw_mask(x, _def_draw_b if batch else _def_draw, draw=draw, p=p, batch=batch) * math.pi/180
              return affine_mat(thetas.cos(), thetas.sin(), t0(thetas),
                               -thetas.sin(), thetas.cos(), t0(thetas))
          
          # Cell

We can see it runs typedispatch on our TensorPoints to take into account this But I think yes it potentially could land a bit outside

vijayabhaskar · February 23, 2020, 3:44am

Yes, imagnet_stats has mean and std of the imagenet datset. If you’re using imagenet weights you should normalise with these stats. If you’re using different stats the model won’t train as you expected, you can clearly see this in Keras, if you don’t preprocess as they did while training on imagenet you will get a model that massively overfits the training set.

vijayabhaskar · February 23, 2020, 6:02pm

@muellerzr I was trying out the Bengali.ai competition with fastai2, I cropped the images and stored them in ‘images/train’ and ‘images/test’. Since the datablock path is different I couldn’t find a way to predict from the dataframe created from test.csv , can you help me with this? learn.predict(image_path) throws some error.

muellerzr · February 23, 2020, 6:41pm

You should do a test_dl and pass in the list of file names from the test dataframe. Look at the starter kernel I referenced in our notebook. It shows how to do the inference. Also here:

https://www.kaggle.com/mnpinto/bengali-ai-fastai2-starter-lb0-9598/comments#752294

Srinivas · February 23, 2020, 10:48pm

yes. the double negatives get/got to me. dont detach if you need gradients. detach if you do NOT need them.