Deep CNN for finding a Homography transformation from a Video Footage to a fixed 2D model

Hi there,

I would like to solve a interesting fundamental problem in Sport Analytic but a bit lost atm… hopefully somebody here could help to evaluate my approach or could point out for me a different direction. I wanted to create a deep learning model to learn the homography of a hockey video footage to map it to the rink model, something similar to this:

I have think of this problem in mind for months and doing quite a lot of research, found out the the traditional approach using features detection & RANSAC to find the Homography between the consecutive frame kind of work, but I think that this can be solve using Deep Learning as well.

The solution I have in mind so far is kind of similar to Facial Keypoints detection:

  1. Define a set of keypoints in the rink model, like (kp1, kp2, kp3...) and they are always in fixed positions. (mostly corners & points that help the model able to differentiate)
  2. Design a CNN that will learn to detect these positions in the video footage and then find the top 4 points and do a warpPerspective to the rink model.

I think the CNN will basically similar to the Facial Keypoints Detection, but the different is that I don’t have the whole rink at a time (compare with the whole face), but only a part of the rink. Hopefully that won’t be a problem for the network to learn. I hope that the model can also learn of the spacial structure of those keypoints.

Do you think my approach above is feasible? I really appreciate your help!

Many Thanks,


Hi @ptgamr, I like the proposed solution, even I was also thinking about the same solution with some other advancement in it like using rink markers. I’m pretty much interested in this topic. Let me know if you achieved anything further, we can contribute together :slight_smile:.