Part 2 Lesson 9 wiki

KevinB · March 27, 2018, 8:08pm

Can somebody explain or link a paper about clusters and scales? I think the clusters are basically what Jeremy was using for the anchor boxes last night when we started out with all of them even (1 cluster) and then added different sizes (so for YOLOv3 it would have 9 clusters). But would the scale mean that each of these will be a different size, so does this mean that there will be 27 total anchor boxes of the different dimensions (clusters) and each will have a small, medium, and large size (scale)?

From YOLOv3 last paragraph of 2.3

We still use k-means clustering to determine our bounding
box priors. We just sort of chose 9 clusters and 3
scales arbitrarily and then divide up the clusters evenly
across scales. On the COCO dataset the 9 clusters were:
(10×13),(16×30),(33×23),(30×61),(62×45),(59×
119),(116 × 90),(156 × 198),(373 × 326).

jeremy · March 27, 2018, 8:11pm

Hey folks don’t forget this is a wiki thread - so please copy useful links over to the top post so they’re all in one central place!

jeremy · March 27, 2018, 8:13pm

You can do that by building a model that predicts just one number - the amount of rotation. It’s actually a great class project to try that previous students have found helpful.

jeremy · March 27, 2018, 8:17pm

Remind me to do this next week if I forget. There should be a paper coming out on Arxiv tomorrow that discusses it.

jeremy · March 27, 2018, 8:18pm

I would expect so, since otherwise the gradients will be overwhelmed by the bit with the larger scale. But I haven’t tested this intuition to know how much it matters - it would be an interesting thing to experiment on and write about.

jeremy · March 27, 2018, 8:27pm

Not yet - very little of the stuff shown in part 2 has been integrated into fastai as yet. It’s all new stuff and we’re using it to help you understand all the moving parts of deep learning and its implementation.

poppingtonic · March 27, 2018, 8:28pm

Is there an updated link with the first half of the livestream? I’m reviewing and can’t see it yet.

jeremy · March 27, 2018, 8:31pm

Nearly! 3 zoom * 3 aspect = k = 9. At 3 scales, 4x4 (16) + 2x2 (4) + 1x1 (1) = 21. 21*9 = 189

jeremy · March 27, 2018, 8:33pm

I always post an edited video ~24 hours after the lecture is done. I’m working on it

suvash · March 27, 2018, 8:35pm

Thanks again Jeremy.

jeremy · March 27, 2018, 8:35pm

It’s a terrific paper and highly recommended. The author is, apparently, sick of the BS required to actually get published and so has decided to conspicuously do the opposite, which I think is quite awesome…

jeremy · March 27, 2018, 8:37pm

Let’s all do our best to be generous in our interpretations of people’s words. We’re all doing our best to figure things out here, and sometimes that means we’ll all write stuff which turns out to be not quite right - and that’s fine, because the following discussion will help resolve it.

KevinB · March 27, 2018, 8:37pm

It really was a fantastic read. I love this style.

jeremy · March 27, 2018, 8:53pm

I’ve updated the top post now with many of the resources recommended by you all in this thread.

jeremy · March 27, 2018, 8:55pm

I’ve now posted the edited video to the top post.

hiromi · March 27, 2018, 8:59pm

For some reason, the video does not load for me. All I can see if the title and a black box.

jamesrequa · March 27, 2018, 9:01pm

Yea it doesnt load for me either.

jeremy · March 27, 2018, 9:03pm

Youtube is still processing it. Should be done in ~5 mins.

edit: actually it’s taking a really long time. dunno if there’s a youtube problem. will re-upload soon if it doesn’t appear

hiromi · March 27, 2018, 9:03pm

Ah. Thanks

sgugger · March 27, 2018, 9:06pm

I was wondering why we initialize our biases to -3 (then -4) in the output convolutional layers of our models. In the code it’s in the OutConv class in this line

self.oconv1.bias.data.zero_().add_(bias)

where bias is set as an argument of SSD_Head and SSD_MultiHead.

I’m guessing it’s to help the model train at the beginning (since those biases will change with the SGD being applied). I’ve tried putting 0 and it’s true we don’t get to the same losses in the same number of epochs, what’s the reason for this?