A View from Emerging Technology From the arXiv
PageRank Algorithm Reveals Soccer Teams' Strategies
Using network theory to analyse the performance of soccer teams and players produces unique insights into the strategy of the world’s best team
Many readers will have watched the final of the Euro 2012 soccer championships on Sunday in which Spain demolished a tired Italian team by 4 goals to nil. The result, Spain’s third major championship in a row, confirms the team as the best in the world and one of the greatest in history.
So what makes Spain so good? Fans, pundits and sports journalists all point to Spain’s famous strategy of accurate quick-fire passing, known as the tiki-taka style. It’s easy to spot and fabulous to watch, as the game on Sunday proved. But it’s much harder to describe and define.
That looks set to change. Today, Javier Lopez Pena at University College London and Hugo Touchette at Queen Mary University of London reveal an entirely new way to analyse and characterise the performance of soccer teams and players using network theory.
They say their approach produces a quantifiable representation of a team’s style, identifies key individuals and highlights potential weaknesses.
Their idea is to think of each player as a node in a network and each pass as an edge that connects nodes. They then distribute the nodes in a way that reflects the playing position of each player on the pitch.
The image above shows the resulting networks for the Netherlands (left) and Spain using data from the knockout stages of the 2010 World Cup in South Africa. These teams contested the final which Spain won.
A visual inspection of these networks immediately reveals some interesting insights into the match. The thickness of the arrows represents the number of passes between nodes and it is immediately clear that the Spanish team pass more often. This image captures 417 passes by the Spanish team versus 266 for the Netherlands.
Key players also stand out by the number of passes they make and receive, such as 16 (Sergio Busquets) and 8 (Xavi).
However, this representation also allows a much more sophisticated analysis using the standard tools of network science.
For example, closeness centrality measures how easy it is to reach a given node in the network. In footballing terms, it measures how well connected a player is in the team.
Busquets and Xavi have the highest scores in the Spanish team. Both are better connected than the best connected Dutch player, 1 (Steckelenberg) the goal keeper. That the goal keeper was the Netherland’s best connected player itself speaks volumes.
Another notion is betweenness centrality, which measures the extent to which a node lies on a path to other nodes. In footballing terms, betweenness centrality measures how the ball flow between players depends on another player. Players with a high betweenness centrality are crucial for keeping the momentum of the game going.
These players are important because removing them has a huge impact on the structure of the network. So a single player with a high betweenness centrality is also a weakness, since the entire team is vulnerable to an injury to this player or a red card.
Spain’s number 11 Joan Capdevilla is the player with by far the highest betweenness centrality in this match. He is clearly a target for passes from many players, which he feeds mainly to 14 (Xabi Alonso).
Then there is the famous PageRank algorithm which measure’s a player’s popularity, as judged by the number of passes he receives from other popular players. It gives a rough idea of who is most likely to end up with the ball after a suitably large number of passes. In this game it is Xavi.
Seven members the starting team that won the 2010 World Cup also started the Euro 2012 final. It’ll be interesting to see Pena and Touchette’s analysis of this tournament and how it varied from the earlier one.
There are clearly limitations to this approach. The data is an average over several games so it fails to capture the dynamics of specific games. And the positions of the nodes are also a vast generalisation and taken only from a player’s nominal starting position.
Pena and Touchette say there are various ways in which this approach could be improved. They suggest adding another node to represent the opponents goal and would record the number of shots. They also imagine using a similar approach to measure the accuracy of passes by taking into account the probability of pass from one player to another being successful.
“The defensive strength of a team could also be incorporated in the model by tracking passing interceptions and recovered balls,” say Pena and Touchette.
Perhaps more fascinating would be a way of collecting and analysing the data in real time to produce a network-based analysis of a game as it happens.
In terms of data analysis, football has always lagged behind more statistically-friendly games such as American football, baseball and cricket, because it lacks the long pauses during which data can be gathered and analysed. That looks set to change.
Ref: arxiv.org/abs/1206.6904: A Network Theory Analysis Of Football Strategies