[Draft Blogpost] Lesson 9: SSD in plain English

As most of you know, Lesson 9’s got a pretty steep learning curve, as is generally characteristic for Part 2. I found myself struggling to keep in mind many of the moving parts as Jeremy works through SSD, the architecture, the loss function and all related elements.
I’ve lost count of how many times I’ve replayed the lecture, and I finally feel like I understand the topic sufficiently to write a “plan English” post on it. I’m hoping that with a top-level idea of how SSD works, the various pieces of the lecture become easier to place in their context.
Before throwing it out there, would one of you be willing to spend 10 minutes going through it to check whether I’m not completely making things up here?



I like this statement a lot, I think it can be very helpful to someone learning this:

No problem, I’ll help you out: I’ll just feed you a pre-defined list of anchor boxes you can use. All you have to do is shift them around a little, or maybe scale them a bit, so that they contain whatever is in the vicinity of that box. Oh, and I’ll need you to tell me what the class is of the object is you’ve got in those boxes.

On the loss function, I am not sure the paragraph is very clear. Overlap information is only used for assigning ground truth boxes / predictions to anchor boxes. It is non differentiable and we cannot use it to backpropagate the error. The loss is based on incorrectly predicting class (or saying an object of some class is assigned to an anchor box when in actuality there is none) and the error in offset predictions.

As for the last paragraph - its single shot because there is a single network we send the image through. Other archs might consist of multiple stages, hence the naming. For instance, we might first have some model that detects regions of interest, another stage that does classification on those regions, another stage to refine predictions, etc.

Thanks Radek, that’s really good feedback! It’s forced me to reconsider some of my understanding and solidify it.

I’ve re-written the part on the loss function to draw a clearer distinction between the matching stage and the loss calculation. I hope that does a better job at clarifying the fact that matching is just about deciding whether we should calculate the loss, but doesn’t actually affect the loss calculation itself (i.e. matching is not related to backprop).

I’ve published the post now, but if there’s more feedback I’d be happy to hear it!