Part 2 Lesson 9 wiki

Anchor boxes are initially chosen by dividing the space equally as a grid.

2 Likes

What is this Excel wizardry?

22 Likes

Quote:

"Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections.

We use a totally different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities."

https://pjreddie.com/darknet/yolo/

is there any thing like a nonlocal receptive field? Using data farther from the center pixel

Can you elaborate on what you mean?

i see in class StdConv, can you just do super.__init__()? as opposed to super(thisclass, self).__init__()?

Why is bias value -3?

1 Like

ok, i just googled it, it’s python3 syntax :slight_smile: cool

2 Likes

where do you see that?

head_reg4 = SSD_Head(k, -3.)

Just to confirm an intuition, this architecture is finding object with non-overlapping bounding boxes?

2 Likes

super.init() only works in Py3, but it could be done.

1 Like

Yep, it’s actually recommended on python 3

1 Like

Yes, but my understanding it that we’d like to avoid manual tuning as much as possible (somewhat in the spirit of differentiable programming).
Maybe we can scale the inputs by sticking them under log (this will still break on huge scale differences)? However, I’m not sure whether this makes any sense.

1 Like

Can you explain what the variable k represents in SSD_Head and it’s associated modules?

2 Likes

We will get into more detail later. These are anchor boxes, and the network will later learn offsets to them (allowing the boxes to overlap). They are used so that you don’t have all the boxes learning the same object when there are multiple objects in the image

3 Likes

But I wonder how we would change the enum stuff in the transformations and augmentations to accommodate this format of points.

Also, it’s no longer rectangles. It’s basically a quadrilateral. And our dataset is not labelled that way. I wonder if that would change anything. Just thinking out loud here.

Thinking about this I have a question, the difference in this scales of the loss functions matter?

1 Like

could you reduce the anchor box size to 1x1 to get a segmentation of the entire image?

Yes, he’ll discuss that later probably

1 Like