Part 2 Lesson 9 wiki

(Kevin Bird) #261

Can somebody explain or link a paper about clusters and scales? I think the clusters are basically what Jeremy was using for the anchor boxes last night when we started out with all of them even (1 cluster) and then added different sizes (so for YOLOv3 it would have 9 clusters). But would the scale mean that each of these will be a different size, so does this mean that there will be 27 total anchor boxes of the different dimensions (clusters) and each will have a small, medium, and large size (scale)?

From YOLOv3 last paragraph of 2.3

We still use k-means clustering to determine our bounding
box priors. We just sort of chose 9 clusters and 3
scales arbitrarily and then divide up the clusters evenly
across scales. On the COCO dataset the 9 clusters were:
119),(116 × 90),(156 × 198),(373 × 326).

(Jeremy Howard (Admin)) #262

Hey folks don’t forget this is a wiki thread - so please copy useful links over to the top post so they’re all in one central place! :slight_smile:

(Jeremy Howard (Admin)) #263

You can do that by building a model that predicts just one number - the amount of rotation. It’s actually a great class project to try that previous students have found helpful.

(Jeremy Howard (Admin)) #264

Remind me to do this next week if I forget. There should be a paper coming out on Arxiv tomorrow that discusses it.

(Jeremy Howard (Admin)) #265

I would expect so, since otherwise the gradients will be overwhelmed by the bit with the larger scale. But I haven’t tested this intuition to know how much it matters - it would be an interesting thing to experiment on and write about.

(Jeremy Howard (Admin)) #267

Not yet - very little of the stuff shown in part 2 has been integrated into fastai as yet. It’s all new stuff and we’re using it to help you understand all the moving parts of deep learning and its implementation.

(Brian Muhia) #268

Is there an updated link with the first half of the livestream? I’m reviewing and can’t see it yet.

(Jeremy Howard (Admin)) #269

Nearly! 3 zoom * 3 aspect = k = 9. At 3 scales, 4x4 (16) + 2x2 (4) + 1x1 (1) = 21. 21*9 = 189

(Jeremy Howard (Admin)) #270

I always post an edited video ~24 hours after the lecture is done. I’m working on it :slight_smile:

(Suvash Thapaliya) #271

Thanks again Jeremy.

(Jeremy Howard (Admin)) #272

It’s a terrific paper and highly recommended. The author is, apparently, sick of the BS required to actually get published and so has decided to conspicuously do the opposite, which I think is quite awesome…

(Jeremy Howard (Admin)) #273

Let’s all do our best to be generous in our interpretations of people’s words. We’re all doing our best to figure things out here, and sometimes that means we’ll all write stuff which turns out to be not quite right - and that’s fine, because the following discussion will help resolve it.

(Kevin Bird) #274

It really was a fantastic read. I love this style.

(Jeremy Howard (Admin)) #276

I’ve updated the top post now with many of the resources recommended by you all in this thread.

(Jeremy Howard (Admin)) #277

I’ve now posted the edited video to the top post.

(Hiromi Suenaga) #278

For some reason, the video does not load for me. All I can see if the title and a black box.

(James Requa) #279

Yea it doesnt load for me either.

(Jeremy Howard (Admin)) #280

Youtube is still processing it. Should be done in ~5 mins.

edit: actually it’s taking a really long time. dunno if there’s a youtube problem. will re-upload soon if it doesn’t appear

(Hiromi Suenaga) #281

Ah. Thanks :slight_smile:


I was wondering why we initialize our biases to -3 (then -4) in the output convolutional layers of our models. In the code it’s in the OutConv class in this line

where bias is set as an argument of SSD_Head and SSD_MultiHead.

I’m guessing it’s to help the model train at the beginning (since those biases will change with the SGD being applied). I’ve tried putting 0 and it’s true we don’t get to the same losses in the same number of epochs, what’s the reason for this?