Can somebody explain or link a paper about clusters and scales? I think the clusters are basically what Jeremy was using for the anchor boxes last night when we started out with all of them even (1 cluster) and then added different sizes (so for YOLOv3 it would have 9 clusters). But would the scale mean that each of these will be a different size, so does this mean that there will be 27 total anchor boxes of the different dimensions (clusters) and each will have a small, medium, and large size (scale)?
From YOLOv3 last paragraph of 2.3
We still use k-means clustering to determine our bounding
box priors. We just sort of chose 9 clusters and 3
scales arbitrarily and then divide up the clusters evenly
across scales. On the COCO dataset the 9 clusters were:
(10×13),(16×30),(33×23),(30×61),(62×45),(59×
119),(116 × 90),(156 × 198),(373 × 326).
You can do that by building a model that predicts just one number - the amount of rotation. It’s actually a great class project to try that previous students have found helpful.
I would expect so, since otherwise the gradients will be overwhelmed by the bit with the larger scale. But I haven’t tested this intuition to know how much it matters - it would be an interesting thing to experiment on and write about.
Not yet - very little of the stuff shown in part 2 has been integrated into fastai as yet. It’s all new stuff and we’re using it to help you understand all the moving parts of deep learning and its implementation.
It’s a terrific paper and highly recommended. The author is, apparently, sick of the BS required to actually get published and so has decided to conspicuously do the opposite, which I think is quite awesome…
Let’s all do our best to be generous in our interpretations of people’s words. We’re all doing our best to figure things out here, and sometimes that means we’ll all write stuff which turns out to be not quite right - and that’s fine, because the following discussion will help resolve it.
I was wondering why we initialize our biases to -3 (then -4) in the output convolutional layers of our models. In the code it’s in the OutConv class in this line
self.oconv1.bias.data.zero_().add_(bias)
where bias is set as an argument of SSD_Head and SSD_MultiHead.
I’m guessing it’s to help the model train at the beginning (since those biases will change with the SGD being applied). I’ve tried putting 0 and it’s true we don’t get to the same losses in the same number of epochs, what’s the reason for this?