Very cool project!! ![]()
Your current approach is interesting - is it on github / would you mind sharing the code?
Personally I’d approach it as a regression problem where you try to predict the (x,y) co-ordinates of each hand. There’s also a large amount of literature on pose-estimation that you should check out. I think the cutting edge approach is Mask-CNN.
@brendan had a forum thread about implementing it here (not sure how far they got):
You should also check out lecture 11 of 2017’s CS231n:
Good luck!