I was working through the book and after a good chunk and some practice I think I am ready to try to start setting up a NN for my task.
But so far I could not figure out how I set up my architecture. It is both a regression and multi-label task.
Here is my rough task:
Basically do a regression task very similar like the Biwi Kinect Head Pose Database just with two differences:
- Additionally to a coordinate
[x, y], also have a radius as a label, so a label would be 3 floats:
[x, y, r], and the more difficult part:
- Have either several or none label in the set
E.g. find circles in the image!
Now for multi-label tasks the way it is done is with 1-hot encoding. So using a bitmap with the length of the max number labels in the data set. And then just flip whichever bits are part of a class. In my case swap out bits with my label tuples:
# one label y_1 = (0.2, 0.3, 0.01) # x, y, r one_hot = y_1, (0,0,0), (0,0,0), (0,0,0), ...
However there are several problems with this approach:
- The length of the one_hot encoding vector is the maxium number of labels in my training data. What if I in reality see an example with more labels in one image?
- Learning (0,0,0) might not be the best, maybe a non-existent label should be encoded differently? E.g. what if two values of a legitimate label are 0 but the radius isn’t? Because 2 of 3 values are 0 it could treat the whole thing as “pretty much zero”. A zero location does not mean “less significant”. A non label should mostly focus on a zero radius.
Another idea could be to just re-use each image for how many labels are in an image. E.g. if there are three labels, treat the image as three images with each one of the labels. However I think this is probably really messing up learning but more importantly: How should I know that I have to feed in an image several times during inference? And how many?
So ideally what I would like my output of the NN for each image to be a list containing label tupes
(x,y,r) or it is empty if there are none.
output_1 =  # no circles found output_2 = [(0.1, 0.3, 0.03)] # one circle found output_3 = [(0.1, 0.3, 0.03), (0.3, 0.4, 0.1), ...] # many circles found
Now I know that the list being empty for a non existent label is probably not possible. So instead if I could get a list back that at least contains one item, I could check via a threshold the size of the radius. If one item comes back and the radius is smaller than my threshold, I can treat it as a non label.
However, how would I even be able to return such a list? Could anyone point me in the right direction? Is this even possible? When I read the 6th chapter I was happy to see both concepts, but combining these seems very difficult to me.