Part 2 Lesson 9 wiki


(Hiromi Suenaga) #281

Ah. Thanks :slight_smile:


#282

I was wondering why we initialize our biases to -3 (then -4) in the output convolutional layers of our models. In the code it’s in the OutConv class in this line

self.oconv1.bias.data.zero_().add_(bias)

where bias is set as an argument of SSD_Head and SSD_MultiHead.

I’m guessing it’s to help the model train at the beginning (since those biases will change with the SGD being applied). I’ve tried putting 0 and it’s true we don’t get to the same losses in the same number of epochs, what’s the reason for this?


(Jeremy Howard) #283

I wanted the starting values for the output sigmoids to have a probability that’s pretty low. Since we know most bounding box activations won’t match most classes. I haven’t tested carefully whether it makes any significant different however - would be interested to hear what you find.


(Hiromi Suenaga) #284

It worked! Time for me to start reviewing :slight_smile:


(Jeremy Howard) #285

(Even Oldridge) #286

I tuned in a few minutes too late and missed the first highlight that jeremy shared. The one before the awesome VAE blog post. Does anyone remember what he shared? I often find those snippets to be very valuable and I was planning to go back and rewatch that part but the rerecording didn’t include that part.


#287

Makes sense! I’ll try to play around this tomorrow, it’ll be good practice. :grinning:


(Lucas Goulart Vazquez) #288

In Bbox per cell section one of the transforms is:

RandomLighting(0.05, 0.05, tfm_y=TfmType.COORD

Why we are making a light transform coordinated with the bounding boxes?


(Jeremy Howard) #289

You can check the video! :slight_smile:


(Jeremy Howard) #290

As mentioned in class, it’s redundant - doesn’t actually do anything here. But doesn’t hurt either.


(Suvash) #291

Finally got around catching up with the second half of the lecture. That was pretty intense :crazy_face: ! The conceptual explanation was super :ok_hand: , will now have to think more about it :thinking: and mess around with the notebooks.

I somehow had this random idea into the first half of the lecture, that if we made the anchor boxes small enough(1px), we could then start classifying at the pixel level. But clearly, I hadn’t thought it through, and my dreams are shattered now. :sob:


(Jeremy Howard) #292

Sure you can :slight_smile: As long as you have pixel-level labels. And if you do, it’s called segmentation, and we’ll be learning that later in this course…


(Suvash) #293

Omg yes !


(Jeremy Howard) #294

I found a bug in flatten_conv in the pascal-multi notebook. Fixed now. Makes finding the smaller objects look much better when k>1.


(Rudraksh Tuwani) #295

Ah, yes. But what if we want to do it in a single pass for images which contain multiple objects of interest? Say you want to detect all the faces in an image and straighten them all in a single pass. Although I’m still skeptical about the use cases, for faces we’d probably be better off using face alignment models.


(Hiromi Suenaga) #297

I need a little bit of a help here with BatchNorm :slightly_frowning_face:

Here I think Jeremy says “BatchNorm is meant to move towards a [0, 1] random variable”. Wouldn’t BN try to make the mean of the activation closer to 0 and standard deviation closer to 1, so activations should be (hopefully) somewhere between -1 and 1?


(Jeremy Howard) #298

Nearly - a standard deviation of 1 means ~68% should fall between -1 and 1, if normally distributed. https://en.wikipedia.org/wiki/68–95–99.7_rule

Although also there are the beta and gamma coeffs (the affine transform) in the batchnorm layer, which can then pull the mean and std away from 0,1 again…


(Hiromi Suenaga) #299

Ohhhh, I got it. 0, 1 is not a range but they correspond to mean and std. I was scratching my head because there must be negative values that we don’t want to chop off with ReLU right after.

Thank you so much for the quick response!!


(Kevin Bird) #300

I’m rewatching the video (thanks for getting it up so quick) and am having some questions when you put the conv2d onto your model and have 2x2x(4+c). So this would take the image and split it into 4 parts. Then each of those parts would be compared against 4+c filters to see which one activated it the most, is that correct? So going back to part1 if the dog eye filter was one of the 4+c filters, this would activate once one of the 4 parts looked like a dog eye, am I saying that correctly?


(Jeremy Howard) #301

No. The ‘matching problem’ is solved by figuring out which ground truth object has the highest jaccard overlap with that grid cell.

After doing that, we take each matched object and the activations corresponding to the matched anchor box, and calculate L1 loss of the 4 coord activations, and binary cross entropy loss of the c+1 class activations (but removing the background class).