When humans try to figure out how someone is feeling, we use a lot of information: facial expressions, body language, where that person is, and more. When computers try to do the same thing, they tend to focus only on the face. That’s a big flaw: according to an important new study, it suggests that most claims made by “emotion recognition” companies are wrong.
Emotion recognition—or using technology to analyze facial expressions and infer feelings—is, by one estimate, set to be a $25 billion business by 2023. Huge companies like Microsoft and Apple, as well as specialized startups like Kairos and Affectiva, are all taking part. Though most commonly used to sell products, emotion recognition technology has also popped up in job recruiting and as a possible tool for figuring out if someone is trying to commit insurance fraud. Back in 2003, the US Transportation Security Administration started training humans to spot potential terrorists by “reading” their facial expressions, so it’s easy to imagine an artificial-intelligence project attempting the same thing. (The TSA program was widely criticized for being based on poor science.)
But for years now, there has been growing pushback against the belief that facial expressions are easy giveaways to feelings. A group of scientists brought together by the Association for Psychological Science spent two years reviewing more than 1,000 papers on emotion detection. They focused on research into how people move their faces when they feel certain emotions, and how people infer other people’s emotional states from their faces. The group concluded that it’s very hard to use facial expressions alone to accurately tell how someone is feeling.
People do smile when they’re happy and frown when they’re sad, but the correlation is weak, says study coauthor Lisa Feldman Barrett, a psychologist at Northeastern University. People do plenty of other things when they’re happy or sad too, and a smile can be wry or ironic. Their behaviors vary a lot across cultures and situations, and context plays a big role in how we interpret expressions. For example, in studies where someone placed a picture of a positive face on the body of someone in a negative situation, people experienced the face as more negative.
In short, the expressions we’ve learned to associate with emotions are stereotypes, and technology based on those stereotypes doesn’t provide very good information. Getting emotion recognition right is expensive and requires collecting a lot of extremely specific data—more, Barrett says, than anyone has so far.
The danger of not enough data
Most of the companies I asked for comment on this story, including Apple and Microsoft, didn’t respond. One that did, Kairos, promises retailers that it can use emotion recognition technology to figure out how their customers are feeling. By scanning the customers’ faces and analyzing a raised eyebrow or a smile to tell whether someone is happy or sad, Kairos provides the kind of data that can be hard for brick-and-mortar companies to collect, says CEO Melissa Doval.
To train its technology, Kairos had people watch emotion-provoking videos and scanned their faces. Some other data came from posed expressions. One person at the company is in charge of labeling that data to feed the algorithm.
This is an extremely common approach, but it has two big weaknesses, according to the new review. One is the posed faces. If you’re told to make a surprised face, it may be very different from how your face actually looks when you’re surprised. The other problem is having a third party go through and label this data. An observer might read a facial expression as “surprised,” but without asking the original person, it’s hard to know what the real emotion was.
The result is a technology with fairly rudimentary abilities. For her part, Doval says the company is currently focusing on improving its camera and dashboard instead of the emotion technology itself. She added that they would eventually be interested in taking research like Barrett’s into consideration and adding demographic data for more context and to make the algorithm more accurate.
The danger of getting it right
Barrett has suggestions for how to do emotion recognition better. Don’t use single photos, she says; study individuals in different situations over time. Gather a lot of context—like voice, posture, what’s happening in the environment, physiological information such as what’s going on with the nervous system—and figure out what a smile means on a specific person in a specific situation. Repeat, and see if you can find some patterns in people with similar characteristics like gender. “You don’t have to measure everybody always, but you can measure a larger number of people that you sample across cultures,” she says. “I think we all gravitate naturally toward this Big Data approach. This is now possible to do, whereas even a decade ago it was much harder.”
This method is more similar to the approach of companies like Boston-based Affectiva. Affectiva cofounder and CEO Rana el Kaliouby agrees that the current understanding of emotions is oversimplified. The company’s own analysis, for example, has shown that there are at least five different types of smiles, from a flirtatious smile to a polite one. Affectiva collects data from 87 countries, records people in real-life situations (like when they’re driving), and asks participants for self-reports on how they feel. “Is it a solved problem? It’s not at all,” el Kaliouby says. Affectiva’s technology is better at classifying “joy,” for example, than it is at differentiating fear, anger, and disgust.
For accuracy, more data is better. But collecting so much personal data has pitfalls too, as the ongoing debates around facial recognition show. Consumers are increasingly afraid of losing privacy, or having their data used against them. “That’s something that should be a worry for any of these systems,” says Tiffany Li, a privacy researcher with Yale University’s Information Society Project. “The issue is having the right safeguards." We need to know, for example, where the data is coming from, how it’s being collected, and how it’s being stored. Will the data be sold or transferred? Will it be linked to any other data sets that might have identifying information?
Affectiva says it refuses to work with surveillance or lie-detection companies. Academics usually have strict limits on how they can collect and share data. But the private sector isn’t governed by broad rules around data collection and use, and that could be dangerous as companies try to improve their technologies. “I don't think we really have enough safeguards right now,” Li says.
Correction: Emotion recognition is set to become a $25 billion business by 2023, according to one market estimate. An earlier version of this article misstated this number.
The US crackdown on Chinese economic espionage is a mess. We have the data to show it.
The US government’s China Initiative sought to protect national security. In the most comprehensive analysis of cases to date, MIT Technology Review reveals how far it has strayed from its goals.
This scientist now believes covid started in Wuhan’s wet market. Here’s why.
How a veteran virologist found fresh evidence to back up the theory that covid jumped from animals to humans in a notorious Chinese market—rather than emerged from a lab leak.
All charges against China Initiative defendant Gang Chen have been dismissed
MIT professor Gang Chen was one of the most prominent scientists charged under the China Initiative, a Justice Department effort meant to counter economic espionage and national security threats.
The China Initiative’s first academic guilty verdict raises more questions than it answers
Observers hoped that the trial of the prominent Harvard professor Charles Lieber would provide some clues into the future of the Department of Justice’s campaign against Chinese economic espionage.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.