Part 2 Lesson 9 wiki

Hi, I have been trying object detection on https://cg.cs.tsinghua.edu.cn/traffic-sign/ dataset and y[0] returns 24 and y[1] return 8 for a minibatch of size 16. Where am I going wrong?

@chloews awesome visualization. How can I compare the predictions made in step 8 with the ground-truth which are 4 coordinates xmin, ymin, xmax, ymax ?

Can you provide the entire show_nmf_single function?

Do you have updates on this topic?

I am trying to apply YOLO v3 object detection for a custom dataset, and I am following this link

Here, in #1, we have to convert our data to Darknet format. One step specifically requires us to normalize the box coordinates, in the input labels, to be between 0-1. So, I converted the labels in my dataset, which provides 4 pairs of coordinates and 1 class to its x_center, y_center and w and h values respectively. My question is how do I normalize this?

I used the image size of 416*416*3, the input image to the first conv layer is 416*416, but this doesn’t work for all labels. In some cases it the midpoint of the bounding box coordinates exceed 1.

Example in the original label file. Starting clockwise from X1
X1 X2 X3 X4 Y1 Y2 Y3 Y4
544.8015 579.83813 541.4978 506.46115 42.720642 60.455795 136.19897 118.46382

Converted labels
class x_center y_center w h
0 1.3056481971153848 0.21504761057692307 0.09439806497198584 0.20407237344933887

As you can see the x_center value is > 1. Any help is appreciated

Hello,

A quick question about the fact that we create one conv2d module for each of the tasks (bboxes and classes). I do not understand how is that different from creating one module with more output channels ?
Each output channel having its own kernel parameters it seems that each channel could specialise itself just as well ?
It sounds a bit easier to work with the labels as we can use tuple indexes to match the label format, but otherwise I do not see how it is better.
Anyone got any thoughts on this ?