A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

Look at the TensorPoint transform and you’ll see we normalize our points to a % (-100% to positive 100%) from the center of the image. In this messy dataset some points were labeled even if it wasn’t ever on the image. Thus we could get greater than 100% or less than -100%. So we want to clamp it down if a point is not present. This is commonly done with the COCO dataset and other keypoints (on their ground truths the point is set to -1,-1 if one isn’t present)

The points are never resized (per say), they stay at the -1,-1 (or 0,0). We add this as a batch transform at the end because it simply looks at all the points after all our augmentation is done, sees if any point is out of the scope we want (-1,1), and clamps it to -1,-1 if any part is. Does this help @mgloria (I’m more than happy to explain this as much as I can because this is a very important detail not talked about much in the fastai library)

1 Like

We pass detach=False in the Unet because we want to differentiate through skip-connection. That’s the only place we do it IIRC.

1 Like

2 more questions:

  1. Why do we need: `dls.c = dls.train.after_item.c
    I see later that we are giving manually n_out=18 so I do not see where dls.c is actually being used.

  2. If we only had 512 channels, how are we passing 1024 as input the adaptativeAvgPool2d and AdaptativeMaxPool2d - (even if each needs 512), how is this possible?
    Usually these layers are used to reduce image size but numbers of channels stays the same.

@sgugger he’s discussing FeatureLoss. This can also be found in the SuperRes notebook from the course (I don’t know the answer but here’s the loss function from the SuperRes course notebook)

 class FeatureLoss(Module):
    def __init__(self, m_feat, layer_ids, layer_wgts):
        self.m_feat = m_feat
        self.loss_features = [self.m_feat[i] for i in layer_ids]
        self.hooks = hook_outputs(self.loss_features, detach=False)
        self.wgts = layer_wgts
        self.metric_names = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))
              ] + [f'gram_{i}' for i in range(len(layer_ids))]

    def make_features(self, x, clone=False):
        self.m_feat(x)
        return [(o.clone() if clone else o) for o in self.hooks.stored]
    
    def forward(self, input, target, reduction='mean'):
        out_feat = self.make_features(target, clone=True)
        in_feat = self.make_features(input)
        self.feat_losses = [base_loss(input,target,reduction=reduction)]
        self.feat_losses += [base_loss(f_in, f_out,reduction=reduction)*w
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out),reduction=reduction)*w**2 * 5e3
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        if reduction=='none': 
            self.feat_losses = [f.mean(dim=[1,2,3]) for f in self.feat_losses[:4]] + [f.mean(dim=[1,2]) for f in self.feat_losses[4:]]
        for n,l in zip(self.metric_names, self.feat_losses): setattr(self, n, l)
        return sum(self.feat_losses)
    
    def __del__(self): self.hooks.remove()

In which we also do not detach here

thanks a lot for the reply! I will look into the code in detail and comment again if still unclear :slight_smile:

dls.c is made if we pass our dataloaders to cnn_learner. It will read this to figure out how many outputs we want in our head

It gets split between the two, so it turns into 512 (they both run at once). Another thing to explore for this is how create_head uses the input filters :wink:

And here:

Thanks @mgloria. So in the case of style transfer we capture the layer specific predictions and compare to the target to generate loss and this loss needs to be differentiated to backprop and reduce loss hence detach = False. Got it.

1 Like

Same thing, you need the gradients of those I believe.

1 Like

Yes - I understood that you detach when you DO NOT need gradients. Did not grasp that gradients were needed when doing feature loss which I now get from @mgloria’s explanation. Thanks

you don’t detach right?

1 Like

Hi @muellerzr I just gone through the notebooks of the last video, I see you manually calculated the mean and std to normalize the dataset for the bengali notebook, but you were using a pretrained model, I don’t understand why you chose to use the dataset’s mean and std while you should have used the imagenet_stats to normalize?

Because I don’t have a 3 channel image (Normalizing ImageNet makes it 3 channels). To make up for that I also changed the first conv layer to accept our 2D image. Jeremy’s rule of thumb: ALWAYS transfer learning when and where you can :slight_smile:

2 Likes

I see, now it makes sense. I have always treated grayscale images as 3 channel images by using .convert(“RGB”) Pil function, I thought that would be a better approach since you don’t need to mess with the model at all and imagenet_stats can be used to normalize. Which is a better approach from your view?

Well, it’s not a 3 channel image, so I’d rather keep it not as one personally :slight_smile: And the dataset is only 2 channels. So I’d need it’s info still. Even if I made it 3 channels. Because that’s still not similar to ImageNet enough to use their stats (IMO) which then says why so I use their weights, again always pretrained. You can get Atleast something from their weights

1 Like

If I’m not wrong theoretically the 3 channel image has the same info right?, as it’s just a linear transformation from 2 channel to 3 channels?

Yes, so I don’t know which is better or worse :slight_smile: mabye they’re the exact same! Many ways to skin a cat. This just one method I saw being used for situations like this :blush:

1 Like

I think converting the gray scale to 3-channel would be better as you’re not throwing away the very first layer, having said that CNN is a powerful model, So I also think it doesn’t matter much which method one uses. Regarding the transfer learning, do you think using transfer learning from huge handdrawn datasets like quickdraw would give better performance? Has anyone done that?

No idea. Possibly.

1 Like

Zach,

Is your style transfer example based on Gatys’s 2015 “A Neural Algorithm of Artistic Style” (what Jeremy describes as “the original way to do it” in his video)?

1 Like

Yes it is! We replicate that technique (which is also what Jeremy did in part 2 a few years back).

You can see the parallels if you explore the source code here:

1 Like