Finding (many) circles: Regression meets Multi-label task

Hello there,

I was working through the book and after a good chunk and some practice I think I am ready to try to start setting up a NN for my task.

But so far I could not figure out how I set up my architecture. It is both a regression and multi-label task.

Here is my rough task:

Basically do a regression task very similar like the Biwi Kinect Head Pose Database just with two differences:

  1. Additionally to a coordinate [x, y], also have a radius as a label, so a label would be 3 floats: [x, y, r], and the more difficult part:
  2. Have either several or none label in the set

E.g. find circles in the image!

Now for multi-label tasks the way it is done is with 1-hot encoding. So using a bitmap with the length of the max number labels in the data set. And then just flip whichever bits are part of a class. In my case swap out bits with my label tuples:

# one label
y_1 = (0.2, 0.3, 0.01) # x, y, r
one_hot = y_1, (0,0,0), (0,0,0), (0,0,0), ...

However there are several problems with this approach:

  1. The length of the one_hot encoding vector is the maxium number of labels in my training data. What if I in reality see an example with more labels in one image?
  2. Learning (0,0,0) might not be the best, maybe a non-existent label should be encoded differently? E.g. what if two values of a legitimate label are 0 but the radius isn’t? Because 2 of 3 values are 0 it could treat the whole thing as “pretty much zero”. A zero location does not mean “less significant”. A non label should mostly focus on a zero radius.

Another idea could be to just re-use each image for how many labels are in an image. E.g. if there are three labels, treat the image as three images with each one of the labels. However I think this is probably really messing up learning but more importantly: How should I know that I have to feed in an image several times during inference? And how many?

So ideally what I would like my output of the NN for each image to be a list containing label tupes (x,y,r) or it is empty if there are none.

output_1 = [] # no circles found
output_2 = [(0.1, 0.3, 0.03)]  # one circle found
output_3 = [(0.1, 0.3, 0.03), (0.3, 0.4, 0.1), ...] # many circles found

Now I know that the list being empty for a non existent label is probably not possible. So instead if I could get a list back that at least contains one item, I could check via a threshold the size of the radius. If one item comes back and the radius is smaller than my threshold, I can treat it as a non label.

However, how would I even be able to return such a list? Could anyone point me in the right direction? Is this even possible? When I read the 6th chapter I was happy to see both concepts, but combining these seems very difficult to me.