Outperforming the human eye with a deep learning data generation system
In earlier blog posts on the next-gen tracking solution for football, BallJames, we have looked into several deep learning components that make up our solution: the YOLO-based detection algorithm, our tracking component and the ball tracking system. This post talks about yet another component: team and player identification. Crucial components, as they drive the ability of BallJames to correctly decide which player is detected without any human assistance.
BallJames knows about five types of teams: the field players of the home and away side, their keepers and the referees. At the beginning of the match, unsupervised learning algorithms decides how to distinguish between the teams, by looking at the persons on the pitch around the moment just before the kick-off. The algorithms will cluster all observations in each of these groups. We optimize these clusters and are then able to reliably classify players throughout the entire match.
We train a deep learning model with a lot of data to be able to detect more information of a player, like the number that we can use for identification. The biggest challenge for us was the acquisition of this data to train our model with. In the past we used to deal with data acquisition by labeling the data ourselves or having external parties take care of this. Labeling our data is very costly in money and time and in addition to that the resulting deep learning model was prone to error. We were labeling the footage from our own camera systems, which only gave us good results if we did not encounter new situations in weather, camera settings, team jerseys and even unseen numbers in different fonts. We realised that human labeling (even with an additional step to check data quality) always came with human error.
Outperforming the human
We managed to solve this vulnerability in our system with a unique method to generate huge quantities of varied data, where we don’t encounter labeling errors. Training our model with this new set of data allows us to be robust for any future situation and also outperform the human eye. We were able to let our system detect cases that were very challenging or illogical for a human, meaning we are now able to deal with an 8 that has its bottom left part occluded. A human would definitely see a 9 there, but our system can tell by the rest of the number that we are looking at an 8. Players with long hair can also occlude the number quite a bit, we’ve seen this before in our matches where we weren’t able to see the number well ourselves, but we could tell because of the system telling us what number was underneath all that hair.
With our new and improved systems, we are able to detect and identify everything that we are interested in on the pitch. Our new data acquisition software allows for a steady environment for us to train our own models for the latest technologies. The novel innovations to our BallJames system will include limb-tracking so we are able to recreate a player’s skeleton in 3D. Plus we will be able to plug into any video feed and generate accurate 3D data from anywhere, which we will probably take a closer look at in a future blog.