Part 2 Lesson 9 wiki

wdhorton · March 27, 2018, 3:12am

Not quite? As I understand it, anchor boxes are just ways we decide to divide up the image, we’re not really trying to guess where things are. Though maybe you could see generating many different kinds of anchor boxes as a kind of “guessing”?

Ducky · March 27, 2018, 3:13am

So… why are we dividing up the image? I’m fundamentally missing why we need anchor boxes.

ganeshk · March 27, 2018, 3:13am

SSD Link

This paper helps with understanding

narvind2003 · March 27, 2018, 3:13am

anchor boxes are fixed, pre-determined boxes. you can choose the number and aspect ratios.
you are trying to mazimize IOU or jaccard index with each bounding box prediction(just 4 points) for each anchor box

quan.tran · March 27, 2018, 3:14am

Are there any materials about the anchor box and related technique Jeremy introduces in this lecture that we can find online? It can help make it easier to understand these concepts

rohitgeo · March 27, 2018, 3:14am

to localize the object detections

bhollan · March 27, 2018, 3:15am

@rachel Can you have him give a two-sentence sum-up of what we’re doing with the anchor boxes and how it relates to the “final product”? Like how do we know what ‘ground truth’ is in test/production?

KevinB · March 27, 2018, 3:15am

So when you say Ground Truth, this is something that somebody hand labeled probably, correct?

bhollan · March 27, 2018, 3:16am

I believe so, yes.

hiromi · March 27, 2018, 3:16am

Excel helped me a lot when I was going through the Retinanet paper.

Borz · March 27, 2018, 3:16am

On ‘background’:

IIRC the main contribution of the Focal Loss (RetinaNet) paper is using a tunable-loss to filter out background activations – so that the objects you want to see don’t get drowned out (you can see a lot of the detections are 0 or bg.

anchor boxes:

Also an idea from a YOLOv2 talk: they used kNN clustering to find the most common bounding-box shapes in the training data to base anchor boxes off of.

The v1 CVPR video – this is before they used learned anchor boxes, but still useful as a starting point. The v2 video at this point, I think, talks about how they find anchors.

binga · March 27, 2018, 3:17am

Yes. Ground truth are the actual annotations given to us, along with the class labels.

Moody · March 27, 2018, 3:19am

it should be SSD (not xD), Single Shot Detection.

snagpaul · March 27, 2018, 3:19am

Could we consider the new final ground truth object a dense representation of what was sparsely present in the dataset out of the box?

YangL · March 27, 2018, 3:19am

What is the loss function describing, physically?

AmanDaVinci · March 27, 2018, 3:20am

Correct me if I’m wrong… Since YOLO has 4 numbers coordinates and plus the class outputs, it is not able to detect mutliple objects, right? There is only one bounding box in YOLO predictions?

Ducky · March 27, 2018, 3:21am

Okay, I think I get it: the anchor boxes are only for calculating the loss function?

Ray2 · March 27, 2018, 3:21am

My understanding, might be wrong though, is that anchor boxes are predefined, ratio, size, and locations. the more varieties one defines, the more IoU will be to the ground truth. (I studied yolo2, so this is my understanding. )

I would suggest @jeremy to assign some papers to read relating to the upcoming class. Maybe this helps.

Like the implementation details though. i.e. adding null-class and removing it.

mandroid6 · March 27, 2018, 3:22am

Jaccard index = area of intersection/area of union

this is for the anchor boxes with the ground truth bbx

KevinB · March 27, 2018, 3:22am

Definitely would appreciate a few papers or articles from Jeremy or others!