Little guidance needed to understand CNN regression

Pager07 · January 5, 2020, 7:59pm

My understanding is that regression will always produce and output given an input. However, for classification, given an input it may or may not be produce an output because the confidence score may be below threshold.

For example, looking at the problem of localization of a basketball in an image.
Running Faster CNN (classification)a ball is detected or it’s not.Where as if you run CNN regression to predict ball location coordinates, it will always give predicton of the ball’s coordinates. (Please correct me if I am wrong)

Am I right?

If the above is true, does it mean the following:
I can pass an image where the ball occluded and the CNN regression will still be able to predict balls location in the image? (Assuming that the training set included occluded images)

I look forward hearing form someone.

muellerzr · January 5, 2020, 8:02pm

No, unless you use sigmoid then it will always produce a prediction (somewhere between 0-1 and take the maximum). Otherwise it won’t because it’s on a threshold. And same for regression, yes it will always include a prediction. But usually the “not there” is something like [0,0]. If you train for it though the model may be able to give you a close answer

Pager07 · January 5, 2020, 8:11pm

Thanks alot for the response.

I am working on a project to do action recognition in basketball videos. I have the need to detect basketball in the video.

The ball itself is low res, often deformed due to high speeds, sometimes occluded.

I really think training to predict ball location CNN regression is better than training the model to search for the ball in the image using ball features.

May you please give me some pointers @muellerzr

muellerzr · January 5, 2020, 8:13pm

You could certainly train a NN to detect the center of the ball, which would wind up being essentially the same as the head pose example as to how well it will do, you’d be surprised what a regression ResNet is capable of

Pager07 · January 5, 2020, 8:26pm

Make sense.
How does some go on to make such dataset like headpose? Because in my mind it would take 100s of hours to label the thousands of images. I have never made my own dataset from scratch before so it definitely seems scary to make one.
I am planning to make simmilar dataset with images of a basketball game.

Is there an efficient way to do it? What size of dataset should I be looking at?

I am sorry I am asking too many question. But I just want to learn.@muellerzr

muellerzr · January 5, 2020, 8:42pm

Perfectly fine! I’d say start small, a few hundred, see how the model does on some unlabeled data and go from there. It is certainly scary, my first one was a very large dataset and it was a headache. Look for some form of a keypoint data labeler tool of sorts (I know some exist). If I can find it I’ll also post the one I used

Pager07 · January 5, 2020, 8:45pm

Really curious, what was it for? Took you days?

Thanks you

muellerzr · January 5, 2020, 8:46pm

It was for the Lockheed Martin drone racing competition last year. It was finding the four corners of a gate and the data was horribly labeled so we had to label our own data until they fixed it (they had outsourced people to label) Several thousands of images (10k or so)

Pager07 · January 5, 2020, 8:52pm

Cool!! I will start building it and try getting some results. I will post back if something interesting comes out of it.