Lesson 8 adapted for counting objects

rohitgeo · March 23, 2018, 7:02pm

I adapted lesson 8 to count the number of objects instead of predicting their bounding boxes. - shared here https://github.com/rohitgeo/deepdive/blob/master/fastai/PascalVOCObjectCount.ipynb

I treated it as a regression problem similar to that of predicting the bounding boxes.

I read about a similar approach in this paper that discusses the problem of counting cars using a one-look technique. What’s interesting is that they treated it as a classification problem using a softmax with 64 outputs. They state that “a regression output may also be reasonable, but the maximum number of cars in a scene is sufficiently small that a softmax is feasible”.

I’ll try that the other approach as well to see what works better, but if someone knows of reasons why one approach would be better than another, I’d love to know that. Thanks!

daveluo · March 23, 2018, 8:06pm

hey @rohitgeo, thanks for the notebook and interesting paper! Look forward to going through them in more detail.

After skimming the counting section of the paper, some initial guesses as to your softmax vs regression output question:

They’re doing other classification and detection tasks using softmax in section 4 of the paper so maybe it was easier/faster from a workflow perspective to change as little as possible and only update the class labels?
Maybe it’s faster or easier to train a softmax with 64 classes because the output is bounded between 0 and 64 and in discrete bins (0, 1, 2, …, 63) vs a regression output which gives continuous and infinite possibilities. They checked that they “cover the entire countable interval” and observed no more than 61 cars in any one patch so my best guess atm is that they’re trying to make the training objective as constrained and closely resembling their dataset as possible.

jeremy · March 24, 2018, 10:36pm

Interesting paper. I see that your results at the moment aren’t looking great - intuitively that’s what I would expect, since it’s hard to count objects without a network that can detect each object. In the paper they’re counting objects that are only one class (car), which is a much easier problem.

(@groverpr is working on a similar problem so tagging him)

groverpr · March 24, 2018, 11:21pm

@rohitgeo Interesting! Thanks for sharing this. IMO, only difference between use of either regression layer vs. classification layer at the end is integer vs. float output. I guess that should not matter much and they would have chosen to solve it as classification problem because of their overall architecture design. But if you find some interesting observation with your experiments, do share with us.

But like @jeremy said, the network should first recognize whether there is an object (among all viable object options) in particular subsection of image or not. Otherwise the network with either regression layer or softmax layer might learn to count background noise of the image also. Another thing is that we don’t want our network to count anything it sees like mountain, trees etc. We should have a list of possible objects to count. I guess, a more accurate approach would be to divide each image into a number of grids.

Let’s say, you have 10x10 images in training data and initial guess is that there can be maximum 100 objects in that image. Divide each image into 100 square grids. Output would be 100 softmax neurons, each telling whether that grid contains viable object or not. To eliminate same object being counted in multiple grids, lablels can be made in a way that output = 1 only when object’s mid point lies in some specific grid. That would also eliminate half objects in image as their mid points won’t be in any grid. Then to count number of objects from each image, you can just count number of grids having output probability > some threshold.

This is supposed to give better results, but creating labels would be more difficult.

This approach has been suggested in YOLO paper for multiple object detections.

Also, if it interests you, you can also checkout this blogpost about Multiple object detection and localization that I wrote a while back. It is about step by step tweaking of the final layer of a neural net for different problems.

rohitgeo · March 25, 2018, 5:59am

Thank you all for the feedback.
I updated the notebook to use classification (for counting) as well - that was quite easy to do!

Your feedback coupled with trying out both approaches helped me understand why they tried it that way in the paper, and why I’m not getting such good results. @groverpr’s explanation and blog post on Yolo helped me finally get an intuitive sense of Yolo. I cheated and skimmed through the upcoming notebooks (Pascal-multi) and eagerly looking forward to the class when we discuss it