Part 2 Lesson 9 wiki


(Sarada Lee) #259

Here is my understanding, please correct me if I am wrong.

To add one = add a new category (bg: background) in addition to the existing categories. However, we don’t want the model to predict background.
image

To subtract one, only bounding boxes with existing categories remind.
image


(Kevin Bird) #260

Some of my highlights from YOLOv3 (There are so many, just read it):

Figure 3. Again adapted from the [7], this time displaying speed/accuracy tradeoff on the mAP at .5 IOU metric. You can tell YOLOv3 is
good because it’s very high and far to the left. Can you cite your own paper? Guess who’s going to try, this guy → [14].

But maybe a better question is: “What are we going to
do with these detectors now that we have them?” A lot of
the people doing this research are at Google and Facebook.
I guess at least we know the technology is in good hands
and definitely won’t be used to harvest your personal information
and sell it to… wait, you’re saying that’s exactly
what it will be used for?? Oh.
Well the other people heavily funding vision research are
the military and they’ve never done anything horrible like
killing lots of people with new technology oh wait…
I have a lot of hope that most of the people using computer
vision are just doing happy, good stuff with it, like
counting the number of zebras in a national park [11], or
tracking their cat as it wanders around their house [17]. But
computer vision is already being put to questionable use and
as researchers we have a responsibility to at least consider
the harm our work might be doing and think of ways to mitigate
it. We owe the world that much.


(Kevin Bird) #261

Can somebody explain or link a paper about clusters and scales? I think the clusters are basically what Jeremy was using for the anchor boxes last night when we started out with all of them even (1 cluster) and then added different sizes (so for YOLOv3 it would have 9 clusters). But would the scale mean that each of these will be a different size, so does this mean that there will be 27 total anchor boxes of the different dimensions (clusters) and each will have a small, medium, and large size (scale)?

From YOLOv3 last paragraph of 2.3

We still use k-means clustering to determine our bounding
box priors. We just sort of chose 9 clusters and 3
scales arbitrarily and then divide up the clusters evenly
across scales. On the COCO dataset the 9 clusters were:
(10×13),(16×30),(33×23),(30×61),(62×45),(59×
119),(116 × 90),(156 × 198),(373 × 326).


(Jeremy Howard) #262

Hey folks don’t forget this is a wiki thread - so please copy useful links over to the top post so they’re all in one central place! :slight_smile:


(Jeremy Howard) #263

You can do that by building a model that predicts just one number - the amount of rotation. It’s actually a great class project to try that previous students have found helpful.


(Jeremy Howard) #264

Remind me to do this next week if I forget. There should be a paper coming out on Arxiv tomorrow that discusses it.


(Jeremy Howard) #265

I would expect so, since otherwise the gradients will be overwhelmed by the bit with the larger scale. But I haven’t tested this intuition to know how much it matters - it would be an interesting thing to experiment on and write about.


(Jeremy Howard) #267

Not yet - very little of the stuff shown in part 2 has been integrated into fastai as yet. It’s all new stuff and we’re using it to help you understand all the moving parts of deep learning and its implementation.


(Brian Muhia) #268

Is there an updated link with the first half of the livestream? I’m reviewing and can’t see it yet.


(Jeremy Howard) #269

Nearly! 3 zoom * 3 aspect = k = 9. At 3 scales, 4x4 (16) + 2x2 (4) + 1x1 (1) = 21. 21*9 = 189


(Jeremy Howard) #270

I always post an edited video ~24 hours after the lecture is done. I’m working on it :slight_smile:


(Suvash) #271

Thanks again Jeremy.


(Jeremy Howard) #272

It’s a terrific paper and highly recommended. The author is, apparently, sick of the BS required to actually get published and so has decided to conspicuously do the opposite, which I think is quite awesome…


(Jeremy Howard) #273

Let’s all do our best to be generous in our interpretations of people’s words. We’re all doing our best to figure things out here, and sometimes that means we’ll all write stuff which turns out to be not quite right - and that’s fine, because the following discussion will help resolve it.


(Kevin Bird) #274

It really was a fantastic read. I love this style.


(Jeremy Howard) #276

I’ve updated the top post now with many of the resources recommended by you all in this thread.


(Jeremy Howard) #277

I’ve now posted the edited video to the top post.


(Hiromi Suenaga) #278

For some reason, the video does not load for me. All I can see if the title and a black box.


(James Requa) #279

Yea it doesnt load for me either.


(Jeremy Howard) #280

Youtube is still processing it. Should be done in ~5 mins.

edit: actually it’s taking a really long time. dunno if there’s a youtube problem. will re-upload soon if it doesn’t appear