Part 2 Lesson 9 wiki

(Vikrant Behal) #215

This week until next Monday is a whole lot of work!

(Wayne Nixalo) #216

Not full walkthroughs but pretty good run-throughs: Two Minute Papers

Looks like a new channel aimed at going in-depth. His VAE video is great: ArXiv Insights

(Arvind Nagaraj) #217

yes, we will see this in the segmentation problem (unet, linknet and other models)

(nirant) #218

Understanding Keywords:

Single Shot Detection or the SSD Multibox work Jeremy discussed has a lot of new concepts such as matching problem, non maximum suppression, re-thought loss functions.

I found this useful to give me a quick grasp of the terms:

Hope you do too :slight_smile:

(Sharwon Pius) #219

they only covered limited papers, till date. New channel, hopefully we can get to see more.

(Stephen Rimac) #220

Thank you @jeremy and @rachel for awesome lesson! Really enjoying part 2 so far. Nice work!

(Sharwon Pius) #221

Good that we have holidays this weekend. Perfect timing !
Going through the code as well as the papers will be extremely challenging.

(Arvind Nagaraj) #222

the camera input goes through SSD in most of the autonomous cars. but of course they have lidars and other instruments as well, generating auxiliary information.

(Kevin Bird) #223

Is the maximum number of objects detected limited by anything? Like if I had 9 anchor boxes, would I be limited to 9 objects being detected?

(Hiromi Suenaga) #224

If you didn’t scale the anchor boxes or change the dimensions, then yes. just 9.

In the examples Jeremy gave, it had 3 size scaling and 3 aspect ratios (1:2, 1:1, 2:1) for each box, so you can detect maximum of 81 objects.

(Lucas Goulart Vazquez) #225

I see your point, but I’m still confused if this would matter when pushing the gradients. The weights are changed based on how much the loss change w.r.t. to them, I’m not certain that if one loss being 10 times bigger than the other directly correlates on the weights being 10x more sensible to this loss.
I’ve this question for a long time, probably I need to do some math to see it :sweat_smile:

(Arvind Nagaraj) #226

no…the only thing that will limit you is the number of classes you are trying to predict.

(Arvind Nagaraj) #227

For the object detection history slide Jeremy showed, this famous blog post provides a lot of the details:

I also liked a TED talk by the YOLO researcher:

(Arvind Nagaraj) #228

…At that particular position where (k=9) anchor boxes were generated.

(Hiromi Suenaga) #229

Right, I meant more “if there were 9 base square boxes and created 9 anchor boxes for each one, you can only detect up to 81 objects.” There will always be limitations for the number of objects you can detect based on your design choice.

(Wayne Nixalo) #230

I had a lot of context going into this lesson which really helped. Here’s a list of videos and papers, roughly in order, that I went through during my Computer Vision deep dive:

Also arXiv-vanity converts papers to HTML and makes them a bit easier on the eyes.

Of all of them, the two CS231N lectures, both taught by J.Johnson were the most important for me. They both present something of a history and logical progression of Object Detection methods; with the first halves being about R-CNN (region-based ConvNets) and the latter halves on SSDs. I highly recommend these two for conceptual background for this lesson. Everything here (except Focal Loss) is covered in them.

  • CS231N S2016 L8 — Localization and Detection

  • CS231N W2017 L11 — Detection and Segmentation

  • YOLO CVPR 2016 talk – the idea of using grid cells and treating detection as a regression problem is focused on in more detail.

  • YOLOv2 talk - there is some good information in this talk, although some drawn explanations are omitted from the video. What I found interesting was the bit on learning anchor boxes from the dataset. There’s also the crossover with NLP at the end.

  • Focal Loss ICCV17 talk – also recommended


I’m sure these have been linked a lot by now, so just here for reference.

I also looked into the videos a bit to try and focus in on some topics:

Coursera Andrew Ng videos:

For me going forward, I haven’t gotten to Feature Pyramid Networks yet, and I’m glad to see we’re going to cover it. For the Mask-RCNN type methods, I had a macabre hunch that Facebook would be very good at picking out faces from crowds… and they have some great research on the topic if you want to check it out:


I may have missed a few things but that’s basically it.

(Divyansh Jha) #231

As a beginner I had a very hard time understanding these. J. Johnson gave a very high-level idea and no details. I would not recommend these to pure beginners.

Instead I would recommend the MOOC course4 Week3 an awesome starting point. and later one can move the above mentioned

(Emil) #232

Imbalanced loss function would lead to the situation where you have relatively low loss value, but only because the model is doing well on one part of the task, but not the other. My guess is that imbalanced losses lead to overfitting for one task and underfitting for another. Also, it somewhat resembles input and layer normalization, where we scale values in a similar fashion.

Math-wise I think it should be in weight updates: W = W - \alpha (dL/dW), and if L = 1000 \; L_a + L_b, then W = W - 1000 \; \alpha (dL_a/dW) - \alpha (dL_b/dW). Here you have a 1000 times bigger update for L_a than for L_b (assume that L_a and L_b have similar scale).

(Davide Boschetto) #233

Hi there, there’s a problem now. The live stream is 1h59m long, and I suppose the lecture has been longer than that, given that the stream starts in the middle of a cell execution. Any way to recover what’s been “lost” for now at the beginning?
I’m particularly interested because I wanted to check if the “missing object” bounding box has been discussed or not (even though I don’t think so, I think it might be worth discussing!)


What are some good blogs to follow ? Especially ones that explain papers in a nicer way. I found a good one but note that this does not have material from today’s lecture