After several days of digging into this, it turns out this is a beast of a task (with great commercial possibilities, but that’s way beyond the scope of my initial request).
Some of the problems:
- Detecting the ball is a very hard problems in football, as in most frames, there’s a lot of occlusion between the player and the ball, so doing it on a frame by frame basis is a no-no.
- Identifying the pitch is not a simple thing (way harder than I expected), as there’s very few markings per frame, so again, it’s very hard to do it on a frame to frame basis
- Tracking identities can be quite hard as player go in and out of the picture constantly, they can go off pitch and in general areas can be very crowded
- All in all, what I thought was a relatively straightforward thing has a million different subtasks, all with their difficulties.
I have found two main datasets:
SoccerNet (which has a bunch of annotations for many tasks, as it’s a yearly running challenge.
And (as posted in the video I linked in my original post) the Kaggle DFL Bundesliga competition
I have managed to convert SoccerNet data into Ultralytics YOLO dataset format, so I could train YOLO v8 for players/referees/goalkeeper detection and tracking with ok results (ball detection is still quite terrible) without having to manually label anything, but that’s as far as I got.
All in all, given that it’s way harder than I was expected, I don’t think I will progress on this, but wanted to report my findings in case somebody else want to have a shot.
Some recent references I found:
Banoth, Thulasya, Mohammad Farukh Hashmi, Zong Woo Geem, and Neeraj Bokde. ‘DeepPlayer-Track: Player and Referee Tracking With Jersey Color Recognition in Soccer’. IEEE Access 10 (1 January 2022): 1–1. DeepPlayer-Track: Player and Referee Tracking With Jersey Color Recognition in Soccer | IEEE Journals & Magazine | IEEE Xplore.
Berjón, Daniel, Carlos Cuevas, and Narciso García. ‘Soccer Line Mark Segmentation and Classification with Stochastic Watershed Transform’. arXiv, 3 August 2022. https://doi.org/10.48550/arXiv.2108.06432.
Cioppa, Anthony, Silvio Giancola, Adrien Deliege, Le Kang, Xin Zhou, Zhiyu Cheng, Bernard Ghanem, and Marc Van Droogenbroeck. ‘SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos’. arXiv, 20 April 2022. https://doi.org/10.48550/arXiv.2204.06918.
Citraro, Leonardo, Pablo Márquez-Neila, Stefano Savarè, Vivek Jayaram, Charles Dubout, Félix Renaut, Andrés Hasfura, Horesh Ben Shitrit, and Pascal Fua. ‘Real-Time Camera Pose Estimation for Sports Fields’. Machine Vision and Applications 31, no. 3 (March 2020): 16. Real-time camera pose estimation for sports fields | SpringerLink.
Cuevas, Carlos, Daniel Berjón, and Narciso García. ‘A Fully Automatic Method for Segmentation of Soccer Playing Fields’. Scientific Reports 13, no. 1 (26 January 2023): 1464. A fully automatic method for segmentation of soccer playing fields | Scientific Reports.
Cuevas, Carlos, Daniel Quilón, and Narciso García. ‘Automatic Soccer Field of Play Registration’. Pattern Recognition 103 (1 July 2020): 107278. Redirecting.
Liu, Nian, Lu Liu, and Zengjun Sun. ‘Football Game Video Analysis Method with Deep Learning’. Computational Intelligence and Neuroscience 2022 (8 June 2022): 3284156. Football Game Video Analysis Method with Deep Learning.
Nie, Xiaohan, Shixing Chen, and Raffay Hamid. ‘A Robust and Efficient Framework for Sports-Field Registration’. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 1935–43. Waikoloa, HI, USA, 2021. A Robust and Efficient Framework for Sports-Field Registration | IEEE Conference Publication | IEEE Xplore.
Seweryn, Karolina, Anna Wróblewska, and Szymon Łukasik. ‘Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer – Current Trends and Research Perspectives’. arXiv, 21 September 2023. https://doi.org/10.48550/arXiv.2309.12067.
Sha, Long, Jennifer Hobbs, Panna Felsen, Xinyu Wei, Patrick Lucey, and Sujoy Ganguly. ‘End-to-End Camera Calibration for Broadcast Videos’. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13624–33. Seattle, WA, USA: IEEE, 2020. End-to-End Camera Calibration for Broadcast Videos | IEEE Conference Publication | IEEE Xplore.
Terven, Juan, and Diana Cordova-Esparza. ‘A Comprehensive Review of YOLO: From YOLOv1 and Beyond’. arXiv, 7 August 2023. https://doi.org/10.48550/arXiv.2304.00501.
Zhang, Ruiheng, Lingxiang Wu, Yukun Yang, Wanneng Wu, Yueqiang Chen, and Min Xu. ‘Multi-Camera Multi-Player Tracking with Deep Player Identification in Sports Video’. Pattern Recognition 102 (1 June 2020): 107260. Redirecting.