Big Data Analysis Is Changing the Nature of Sports Science
When it’s possible to record the exact movements of players in team games such as football, basketball, and so on, how can algorithms crunch this data to provide meaningful insight?
The best-selling book Moneyball by Michael Lewis changed the way people thought about sport, particularly for those owners, managers, and players with the biggest vested interests. Lewis’s book helped bring about a revolution in which player performance was measured and assessed using an evidence-based approach rather than a tradition dominated by anecdote and intuition.
Since then, sports scientists have attempted to replicate the success of this approach in sports such as basketball, soccer, American football, and so on. This science is driven by the relatively new ability to gather vast amounts of data about the players and the play while the game is in progress.
However, in many of these sports, the capacity to gather data has not been matched by an ability to process it in meaningful ways. So an interesting question is what challenges sports sciences face in crunching this data effectively. What are the open questions in this rapidly evolving field?
Today we get an answer thanks to the work of Joachim Gudmundsson and Michael Horton at the University of Sydney in Australia, who have reviewed this field and listed the outstanding challenges that researchers face in making analytics meaningful.
The sports these guys consider are together known as invasion games. They all consist of two teams that compete for possession of a ball in a constrained playing area. Each team has the simultaneous objective of scoring by putting the ball into the opposition’s goal and also of defending its own goal. The team that scores the greatest number of goals by the end of the game is the winner.
Invasion sports that share this structure include soccer, basketball, ice hockey, field hockey, rugby, Australian rules football, American football, lacrosse and so on. However, most of the data comes from games such as professional soccer and basketball, which have the resources to gather it.
This data generally consists of player and ball trajectories throughout the game, and event logs that describe events such as passes, shots, tackles, and so on at specific times. “State of the art object tracking systems now produce spatio-temporal traces of player trajectories with high definition and high frequency, and this, in turn, has facilitated a variety of research efforts, across many disciplines, to extract insight from the trajectories,” say Gudmundsson and Horton.
The big challenge in sports science is to use this data to gain a competitive advantage, whether in real time during the game or to help in training, preparation, or recruitment. But while researchers have made significant progress, there are also important hurdles barring the way.
One of the most significant involves understanding how players can dominate parts of the pitch near them. In sports science, a player’s dominant region is the region he or she can reach before any other player. A simple way to calculate this is to draw a Voronoi diagram, which divides the pitch into the regions closest to each player (see diagram).
Such a diagram can be modified with the help of other information, such as the observation that dominant regions tend to be larger for the attacking team than the defending team.
However, calculating the Voronoi diagram for each player on the pitch is computationally expensive. Nobody has successfully done it in real time, even for RoboCup football.
Instead, researchers calculate a different property—the region each player can reach in a given time—and then look for overlaps, which are then resolved. This increases speed by a factor of 1,000 at a cost of a 10 percent loss in accuracy.
But even then, this approach ignores a number of crucial factors. Perhaps the most significant is that it takes no account of the players’ momentum. Clearly, a player in motion can dominate a greater region ahead than a stationary player.
This can lead to complex subdivisions of the pitch. When player A runs at an opposing player B who is stationary, each may have more than one dominant region, and these may not be connected to each other. For example, player A’s momentum gives better access to some, but not all, of the region behind B.
So an important open problem in sports science is how to calculate realistic dominant regions in real time.
Another related challenge is to work out whether a player is open to receive a pass. That means determining if there is a certain speed and direction that the ball can be passed so that a given player can intercept it before any other.
This is obviously linked to the player’s dominant region. Given an accurate idea of what that region is, it’s straightforward to work out a straight-line pass that falls within it. Indeed, that’s how the current tools that do this work.
The problem is that only certain trajectories meet the criterion of being straight-line passes. An aerial trajectory, for example. is not a straight-line pass. No tool yet exists that can handle these (or more complex motions involving the spin of the ball), and this is another open problem in sports science
Then there is the way that one player can put pressure on other players by closing down the space around them. How can this be measured and incorporated into models?
An increasingly important area of sporting analysis involves network science. This treats each player as a node and draws a line between them when the ball travels from one to the other. This has been a fruitful area of research because a wide range of mathematical tools have already been developed for analyzing networks.
For example, it is straightforward to work out the most important nodes in the network using a measure known as centrality. In soccer, goalkeepers and forwards have the lowest centrality, while defenders and midfielders have the highest.
The same kind of science also allows the network to be divided into clusters. So some team members might only pass to each other or work more effectively together.
However, the problem with network science is that there are numerous different ways of measuring centrality and determining clusters, and it is not always clear why one method should be preferred over another. So another open problem is to systematically evaluate and compare these different methods to determine their utility and value.
Another class of problems come from analyzing game-play data. For example, given the list of player trajectories and event logs for a period during the game, is it possible to determine the team formation – for example, 4-4-2 in soccer –or the type of marking used by the defensive team, such as a full-court press or a zone defense in basketball?
There is some evidence that this can be done some of the time in certain sports. But matching or beating human performance in this is still the goal.
Gudmundsson and Horton describe various other open problems and how ideas developed in sports such as football and basketball could usefully be applied in other invasion sports, such as hockey and handball.
But perfecting algorithms that can solve these problems is only half the battle. The next stage will be to ask how these tools can help improve performance both on and off the field. Can they be used as a metric of player performance and value? Can they determine whether a player who is successful on one team will be also be successful on another? And can they work in real time during a game to help coaches and fans alike?
There are likely to be significant developments in the coming years. Clearly there are exciting times ahead for data analysts in sport.
Ref: arxiv.org/abs/1602.06994 : Spatio-Temporal Analysis of Team Sports – A Survey