Tracking players in sports. Help framing the problem

I am mulling over a side project that I have wanted to do for a long time, which is related to automatic tracking football (soccer if you are that kind of person) players in videos to help with technical analysis.

Good commercial products exist and I definitely don’t want to compete with that, I just would like to understand whether I could set up a simple system to help me play around without having to manually editing videos from scratch.

The final result I would like to get to is: starting from a video feed, I’d like to get to a 2d representation. This does not to be real time.

I can split the problem in a few deep learning tasks (the rest is out of scope here):

  1. Identify the pitch in the video feed
  2. Identify people and ball in the video, splitting them into different teams/referee
  3. Making sure that during the feed identities are consistent. I.e. if a person is identified as “Team a player 3” in a frame, they are still “Team a player 3” in the rest of the video

The reason why I think this should be relatively doable is that there’s an abundance of available video feed, the object to be tracked are either people or a ball, so pretraining with more common datasets should not be an issue, but I get stuck on the details, as I have never worked with video.

Due to personal constraints, I have very small chunks of time I can dedicate to this, so I’d love to bounce some ideas on how to approach the problem (a vague general plan and a more concrete first step). Any ideas?

EDIT: I found this

After several days of digging into this, it turns out this is a beast of a task (with great commercial possibilities, but that’s way beyond the scope of my initial request).

Some of the problems:

  1. Detecting the ball is a very hard problems in football, as in most frames, there’s a lot of occlusion between the player and the ball, so doing it on a frame by frame basis is a no-no.
  2. Identifying the pitch is not a simple thing (way harder than I expected), as there’s very few markings per frame, so again, it’s very hard to do it on a frame to frame basis
  3. Tracking identities can be quite hard as player go in and out of the picture constantly, they can go off pitch and in general areas can be very crowded
  4. All in all, what I thought was a relatively straightforward thing has a million different subtasks, all with their difficulties.

I have found two main datasets:
SoccerNet (which has a bunch of annotations for many tasks, as it’s a yearly running challenge.

And (as posted in the video I linked in my original post) the Kaggle DFL Bundesliga competition

I have managed to convert SoccerNet data into Ultralytics YOLO dataset format, so I could train YOLO v8 for players/referees/goalkeeper detection and tracking with ok results (ball detection is still quite terrible) without having to manually label anything, but that’s as far as I got.

All in all, given that it’s way harder than I was expected, I don’t think I will progress on this, but wanted to report my findings in case somebody else want to have a shot.

Some recent references I found:
Banoth, Thulasya, Mohammad Farukh Hashmi, Zong Woo Geem, and Neeraj Bokde. ‘DeepPlayer-Track: Player and Referee Tracking With Jersey Color Recognition in Soccer’. IEEE Access 10 (1 January 2022): 1–1. DeepPlayer-Track: Player and Referee Tracking With Jersey Color Recognition in Soccer | IEEE Journals & Magazine | IEEE Xplore.

Berjón, Daniel, Carlos Cuevas, and Narciso García. ‘Soccer Line Mark Segmentation and Classification with Stochastic Watershed Transform’. arXiv, 3 August 2022.

Cioppa, Anthony, Silvio Giancola, Adrien Deliege, Le Kang, Xin Zhou, Zhiyu Cheng, Bernard Ghanem, and Marc Van Droogenbroeck. ‘SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos’. arXiv, 20 April 2022.

Citraro, Leonardo, Pablo Márquez-Neila, Stefano Savarè, Vivek Jayaram, Charles Dubout, Félix Renaut, Andrés Hasfura, Horesh Ben Shitrit, and Pascal Fua. ‘Real-Time Camera Pose Estimation for Sports Fields’. Machine Vision and Applications 31, no. 3 (March 2020): 16. Real-time camera pose estimation for sports fields | SpringerLink.

Cuevas, Carlos, Daniel Berjón, and Narciso García. ‘A Fully Automatic Method for Segmentation of Soccer Playing Fields’. Scientific Reports 13, no. 1 (26 January 2023): 1464. A fully automatic method for segmentation of soccer playing fields | Scientific Reports.

Cuevas, Carlos, Daniel Quilón, and Narciso García. ‘Automatic Soccer Field of Play Registration’. Pattern Recognition 103 (1 July 2020): 107278. Redirecting.

Liu, Nian, Lu Liu, and Zengjun Sun. ‘Football Game Video Analysis Method with Deep Learning’. Computational Intelligence and Neuroscience 2022 (8 June 2022): 3284156. Football Game Video Analysis Method with Deep Learning.

Nie, Xiaohan, Shixing Chen, and Raffay Hamid. ‘A Robust and Efficient Framework for Sports-Field Registration’. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 1935–43. Waikoloa, HI, USA, 2021. A Robust and Efficient Framework for Sports-Field Registration | IEEE Conference Publication | IEEE Xplore.

Seweryn, Karolina, Anna Wróblewska, and Szymon Łukasik. ‘Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer – Current Trends and Research Perspectives’. arXiv, 21 September 2023.

Sha, Long, Jennifer Hobbs, Panna Felsen, Xinyu Wei, Patrick Lucey, and Sujoy Ganguly. ‘End-to-End Camera Calibration for Broadcast Videos’. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13624–33. Seattle, WA, USA: IEEE, 2020. End-to-End Camera Calibration for Broadcast Videos | IEEE Conference Publication | IEEE Xplore.

Terven, Juan, and Diana Cordova-Esparza. ‘A Comprehensive Review of YOLO: From YOLOv1 and Beyond’. arXiv, 7 August 2023.

Zhang, Ruiheng, Lingxiang Wu, Yukun Yang, Wanneng Wu, Yueqiang Chen, and Min Xu. ‘Multi-Camera Multi-Player Tracking with Deep Player Identification in Sports Video’. Pattern Recognition 102 (1 June 2020): 107260. Redirecting.

Hi Miko

If you look at Rachel’s course ( - New course: Computational Linear Algebra ) she discusses tracking a person walking across a yard. So consider a player crossing a football field, it should be similar. I was wondering how you would deal with players in close proximity for a tackle or marking. Also how to process a wall for an indirect free kick. Perhaps there is potential to read the names and numbers on the shirts.

Regards Conwyn

Thanks Conwyn!

I’ll definitely check it out, although right now the actual tracking is the least of my problems (there are a couple of good enough trackers that work out of the box). Apparently (and I am not the only one) even the ball detection is a tough problem. Tough enough that the big super expensive software available only has it in beta…

Hello Mikl

I know this topic is probably 3/4years old. However I embarked on this exact idea you posted, and the model is currently in train. Would you mind we connect?

Not that old (it is only 3-4 months). Happy to connect in the new year, but in the meanwhile do feel free to post here