Part 2 Lesson 9 wiki

Not quite? As I understand it, anchor boxes are just ways we decide to divide up the image, we’re not really trying to guess where things are. Though maybe you could see generating many different kinds of anchor boxes as a kind of “guessing”?

1 Like

So… why are we dividing up the image? I’m fundamentally missing why we need anchor boxes.

1 Like

SSD Link

This paper helps with understanding

4 Likes

anchor boxes are fixed, pre-determined boxes. you can choose the number and aspect ratios.
you are trying to mazimize IOU or jaccard index with each bounding box prediction(just 4 points) for each anchor box

7 Likes

Are there any materials about the anchor box and related technique Jeremy introduces in this lecture that we can find online? It can help make it easier to understand these concepts

to localize the object detections

@rachel Can you have him give a two-sentence sum-up of what we’re doing with the anchor boxes and how it relates to the “final product”? Like how do we know what ‘ground truth’ is in test/production?

1 Like

So when you say Ground Truth, this is something that somebody hand labeled probably, correct?

2 Likes

I believe so, yes.

2 Likes

Excel helped me a lot when I was going through the Retinanet paper.

15 Likes

On ‘background’:

IIRC the main contribution of the Focal Loss (RetinaNet) paper is using a tunable-loss to filter out background activations – so that the objects you want to see don’t get drowned out (you can see a lot of the detections are 0 or bg.

anchor boxes:

Also an idea from a YOLOv2 talk: they used kNN clustering to find the most common bounding-box shapes in the training data to base anchor boxes off of.

The v1 CVPR video – this is before they used learned anchor boxes, but still useful as a starting point. The v2 video at this point, I think, talks about how they find anchors.

13 Likes

Yes. Ground truth are the actual annotations given to us, along with the class labels.

3 Likes

it should be SSD (not xD), Single Shot Detection.

Could we consider the new final ground truth object a dense representation of what was sparsely present in the dataset out of the box?

What is the loss function describing, physically?

Correct me if I’m wrong… Since YOLO has 4 numbers coordinates and plus the class outputs, it is not able to detect mutliple objects, right? There is only one bounding box in YOLO predictions?

2 Likes

Okay, I think I get it: the anchor boxes are only for calculating the loss function?

2 Likes

My understanding, might be wrong though, is that anchor boxes are predefined, ratio, size, and locations. the more varieties one defines, the more IoU will be to the ground truth. (I studied yolo2, so this is my understanding. )

I would suggest @jeremy to assign some papers to read relating to the upcoming class. Maybe this helps.

Like the implementation details though. i.e. adding null-class and removing it.

7 Likes

Jaccard index = area of intersection/area of union

this is for the anchor boxes with the ground truth bbx

1 Like

Definitely would appreciate a few papers or articles from Jeremy or others!