Part 2 Lesson 9 wiki

If there are two overlapping bounding boxes within an anchor box, can the model only predict one of the classes?

1 Like

Depends on how we map the ground truth to represent the data. I’m thinking that would have 16 boxes with 16 categories in the example here.

2 Likes

The anchor boxes are used for calculating the loss function. Also, the predicted box is the anchor box plus activations.

9 Likes

no…YOLO is also multi class prediction.
in YOLO you have a flattened dense layer with no geometry. in SSD you are doing it from the last conv2d directly. (well, as he said - this morning they released the YOLOv3 paper and it is switching to the SSD method)

3 Likes
def actn_to_bb(actn, anchors):
actn_bbs = torch.tanh(actn)
actn_centers = (actn_bbs[:,:2]/2 * grid_sizes) + anchors[:,:2]
actn_hw = (actn_bbs[:,2:]/2+1) * anchors[:,2:]
return hw2corners(actn_centers, actn_hw)

This is where I got the idea that the anchor boxes are the basis for the (predicted) bounding boxes . . .

okay, Rachel addressed this above in her reply to Ducky . . .

1 Like

Your predicted boxes are anchor boxes plus activations. Since you are adding activations (which are learned), the predicted boxes can have “moved” from the location of the anchor boxes.

8 Likes

How do we decide the sizes and ratios of the anchor boxes to create? Is there a API in fastai to generate them, like we do transformations?

1 Like

are anchor boxes and grid cell same?

In deeplearning.ai MOOC they considered them different. Grid cell was the single part of the division of the original image into NXN while anchor boxes were the predicted volume say 4X4X((4+C)XK)

2 Likes

Something I never got: do anchor boxes ever change/transform or is it more of a “let me match my anchor boxes to the grid cell and see which one works the best”?

1 Like

I guess! That’s how it’s working/learning

They have a very limited movement. Only 50% in x and/or y, no rotation.

But there’s also a host of anchor boxes to give you good ‘reception’ of the actual object.

2 Likes

anchor boxes are used within each grid cell. The grid cell just splits up the image, the anchors are used for loss calculation.

1 Like

So the activations are saying “take the anchor box and make it smaller (or larger) and adjust the x coordinate by X and the y coordinate by Y in order to get your predicted box”

Is that correct?

3 Likes

I’m not familiar with YOLO. Can you tell me the shape of prediction outputs and how does it help in multi-class classification? I’m assuming there is only 1 bbox and 21 classes. Correct me if I’m wrong.

It’s present in Set up model section. anc_offset, anc_ctrs, anc_sizes.

1 Like

“Don’t worry if it’s a bit complicated at first”

Phew!

9 Likes

I understand why we have a 4x4 grid of receptive fields (with 1 anchor box each) to coarsely localize objects in the image. In this case, every ground truth bbox has an anchor box that is associated with it, but not every anchor box has a bounding box associated with.

What I think I’m missing is why we need multiple receptive fields at different sizes, each with multiple anchor boxes of differing ratios associated with them.

The first version already included 16 receptive fields, each with a single anchor box associated with them. With the additions, there are now many more anchor boxes to consider. Why are those additional anchor boxes necessary if we already had 16 anchor boxes to correspond with each of the possible 16 objects to detect?

Is this because you constrained how much a receptive field could move or scale from its original size? Or is there another reason?

4 Likes

Could we take a different approach here that doesn’t require us to manually code a loss function with this complex logic of anchor boxes?

E.g., take an adversarial approach, where we have a second net that is trying to guess if the output of bounding boxes coming from our ConvNet or our ground truth. A method like this is used for aligning word vector spaces: https://github.com/facebookresearch/MUSE

2 Likes

Even I am confused regarding this! :face_with_raised_eyebrow:

1 Like

There’s your reading list:


https://arxiv.org/abs/1506.01497
https://arxiv.org/abs/1506.02640
https://arxiv.org/abs/1512.02325
https://arxiv.org/abs/1708.02002

9 Likes