Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Connectivity

Data Mining Reveals the Way Humans Evaluate Each Other

Vast databases of soccer statistics expose the limited way human observers rate performance and suggest how they can do significantly better.

The way we evaluate the performance of other humans is one of the bigger mysteries of cognitive psychology. This process occurs continuously as we judge individuals’ ability to do certain tasks, assessing everyone from electricians and bus drivers to accountants and politicians.

The problem is that we have access to only a limited set of data about an individual’s performance—some of it directly relevant, such as a taxi driver’s driving record, but much of it irrelevant, such as the driver’s sex. Indeed, the amount of information may be so vast that we are forced to decide using a small subset of it. How do those decisions get made?

Today we get an answer of sorts thanks to the work of Luca Pappalardo at the University of Pisa in Italy and a few pals who have studied this problem in the sporting arena, where questions of performance are thrown into stark relief.  Their work provides unique insight into the way we evaluate human performance and how this relates to objective measures.

The factors human observers use to rate performance are a small subset of objective measures.

Sporting performance is one area where detailed records of individual performance have been gathered for some years. Pappalardo and co focus on soccer, the world’s most popular sport, and in particular on the performance of players competing at the top of the sport in Italy’s Serie A football league.

For many years, Italian sports newspapers have rated the performance of players in every game on a scale of 0 to 10, where 0 is unforgettably bad and 10 unforgettably amazing. This system is based on the Italian system of school ratings, where a 6 indicates that a pupil has performed adequately. The way the players are rated is not published, but it is presumably done by an expert sports journalist.

In recent years, the same players have also been evaluated by an objective measurement system that counts the number of passes, shots, tackles, saves, and so on that each player makes. This technical measure takes into account 150 different parameters and provides a comprehensive account of every player’s on-pitch performance.

The question that Pappalardo and co ask is how the newspaper ratings correlate with the technical ratings, and whether it is possible to use the technical data to understand the factors that influence human ratings.

The researchers start with the technical data set of 760 games in Serie A in the 2015-16 and 2016-17 seasons. This consists of over a million data points describing time-stamped on-pitch events. They use the data to extract a technical performance vector for each player in every game; this acts as an objective measure of his performance.

The researchers also have the ratings for each player in every game from three sports newspapers: Gazzetta dello Sport, Corriere dello Sport, and Tuttosport.

The newspaper ratings have some interesting statistical properties. Only 3 percent of the ratings are lower than 5, and only 2 percent higher than 7. When the ratings are categorized in line with the school ratings system—as bad if they are lower than 6 and good if they are 7 and above—bad ratings turn out to be three times as common as good ones.

In general, the newspapers rate a performance similarly, although there can be occasional disagreements by up to 6 points. “We observe a good agreement on paired ratings between the newspapers, finding that the ratings (i) have identical distributions; (ii) are strongly correlated to each other; and (iii) typically differ by one rating unit (0.5),” say Pappalardo and co.

To analyze the relationship between the newspaper ratings and the technical ratings, Pappalardo and co use machine learning to find correlations in the data sets. In particular, they create an “artificial judge” that attempts to reproduce the newspaper ratings from a subset of the technical data.

This leads to a curious result. The artificial judge can match the newspaper ratings with a reasonable degree of accuracy, but not as well as the newspapers match each other. “The disagreement indicates that the technical features alone cannot fully explain the [newspaper] rating process,” say Pappalardo and co.

In other words, the newspaper ratings must depend on external factors that are not captured by the technical data, such as the expectation of a certain result, personal bias, and so on.

To test this idea, Pappalardo and co gathered another set of data that captures external factors. These include the age, nationality, and club of the player, the expected game outcome as estimated by bookmakers, the actual game outcome, and whether a game is played at home or away.

When this data is included, the artificial judge does much better. “By adding contextual information, the statistical agreement between the artificial judge and the human judge increases significantly,” say the team.

Indeed, they can clearly see examples of the way external factors influence the newspaper ratings. In the entire data set, only two players have ever been awarded a perfect 10. One of these was the Argentine striker Gonzalo Higuaín, who played for Napoli. On this occasion, he scored three goals in a game, and in doing so he became the highest-ever scorer in a season in Serie A.  That milestone was almost certainly the reason for the perfect rating, but there is no way to derive this score from the technical data set.

An important question is what factors the artificial judge uses to match the newspaper ratings. “We observe that most of a human judge’s attention is devoted to a small number of features, and the vast majority of technical features are poorly considered or discarded during the evaluation process,” say Pappalardo and co.

So for attacking forward players, newspapers tend to rate them using easily observed factors such as the number of goals scored; they rate goalkeepers on the number of goals conceded. Midfield players tend to be rated by more general parameters such as the goal difference.

That makes sense—human observers have a limited bandwidth and are probably capable of observing only a small fraction of performance indicators. Indeed, the team say the artificial judge can match human ratings using less than 20 of the technical and external factors.

That’s a fascinating result that has important implications for the way we think about performance ratings. The goal, of course, is to find more effective ways of evaluating performance in all kinds of situations. Pappalardo and co think their work has a significant bearing on this. “This paper can be used to empower human evaluators to gain understanding on the underlying logic of their decisions,” they conclude.

Ref: arxiv.org/abs/1712.02224 : Human Perception of Performance

Want to go ad free? No ad blockers needed.

Become an Insider
Already an Insider? Log in.
The factors human observers use to rate performance are a small subset of objective measures.
More from Connectivity

What it means to be constantly connected with each other and vast sources of information.

Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning print magazine, unlimited online access plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    Print Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

  • Insider Online Only {! insider.prices.online !}*

    {! insider.display.menuOptionsLabel !}

    Unlimited online access including articles and video, plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

/3
You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.