AI Reads Human Emotions. Should it?

In part two of our series on emotion AI, we explore the sticky concepts of bias, safety and use.

October 14, 2020

AI can read your emotional response to advertising and your facial expressions in a job interview. But if it can already do all this, what happens next? In part two of a series on emotion AI, Jennifer Strong and the team at MIT Technology Review explore the implications of how it’s used and where it’s heading in the future.

We meet:

Shruti Sharma, VSCO
Gabi Zijderveld, Affectiva
Tim VanGoethem, Harman
Rohit Prasad, Amazon
Meredith Whittaker, NYU's AI Now Institute

Credits:

This episode was reported and produced by Jennifer Strong, Karen Hao, Tate Ryan-Mosley, and Emma Cillekens. We had help from Benji Rosen. We’re edited by Michael Reilly and Gideon Lichfield.

Full episode transcript:

Jennifer Strong: In this era of Covd-19 so many of the relationships and experiences we once enjoyed in-person are now mediated by technology. Whether you're working from home, helping educate kids from the confines of your own walls, or like so many of us, doing both, a new constant that probably isn't going away is the video call. Whether you're in a Zoom room, Google Meet or something else the reality is it’s likely here to stay even after this current moment passes. And there’s something about this experience that feels distanced and disconnected. As if you’re communicating through a filter. Simple conversations interrupted by internet failures or lost in translation. What if this technology could be used to enhance interactions, rather than mute them? What if it could respond to your body language, your vocal intonations, to help you convey more than just words? It's something Affectiva Founder Rana el Kaliouby hinted at when we spoke for our last episode.

Rana el Kaliouby: Where I am presenting to say a hundred remote people, if it were a real live event, I would riff off the energy of the people in the room and I can't do that online. And it's, it's really actually painful and I hate it. [laugh] So, I keep imagining like a real time graph of level of engagement, level of laughter, maybe the like some emoji stream or something that just gives people a sense of this shared experience.

Jennifer Strong: Her company and many others are rushing to work on these kinds of possibilities but much of the technology that would power such things is already in use often in unexpected ways. Such as, to measure facial expressions in people who’ve suffered a stroke.

Gabi Zijderveld: There's a lot of social stigma attached to that because people think that these people look angry or they're scowling, but they had a stroke and can't smile. Typically surgeons measure whether or not their rebuilding of smiles is successful by tickling their patients and then they start laughing and they would use a ruler and measure millimeter movements.

Jennifer Strong: And one surgeon thought that method was ridiculous.

Gabi Zijderveld: So, using our emotion AI he built a software system to benchmark and measure how successful he is, rebuilding patients' smiles.

Jennifer Strong: I’m Jennifer Strong and in part two of our series exploring emotion AI we look at how it’s already being applied, and where that might lead us in the future.

[SHOW ID]

Shruti Sharma: Ok, so I’m going to share my screen here. Let me know if you can see it.

Jennifer Strong: Shruti Sharma is the senior engineering manager for machine learning at the photo app called VSCO.

Shruti Sharma: And so you're just seeing these orange blinds in a dark black looking room. And the images that are related that are showing up are sort of these mysterious sense of play of light and darkness and lights and shadows, essentially.

Jennifer Strong: These photos were chosen by Ava which is an AI that assesses images and categorizes them by mood and emotion. A type of sorting that used to be done by people. And we’re not talking about beach landscapes falling into the same pile, or a whole bunch of cat photos grouped together. This is much more nuanced. I didn’t expect this. It’s not just the same shade of color it really is evoking the same kind of feeling from photo to photo.

Shruti Sharma: So for us, the goal of recognizing the mood and the feeling or emotion in a photo is essentially to capture that essence that only a human would otherwise be able to see. Our machine learning technology, Ava sort of looks not just at the, at the content of the photo, but also attributes that are very specific to photography that contribute to these feelings and emotions that a photo produces in humans… things like composition, shot style, aesthetic.

Jennifer Strong: This app is a creative platform for photographers.

Shruti Sharma: The fact that all of these images sort of have this feeling of there's more happening here than what we're seeing in the photo. It's hard to tell what's going on fully, and there's like the sense of mystery here which is being evoked. It's a very personal feeling that photographs evoke and for a machine to be able to sort of match images based on that almost brings joy to my heart. [laugh]

Jennifer Strong: Unlike other applications of this technology, if the AI misses a category or a classification the stakes here are pretty low.

Shruti Sharma: Which I think is also the beauty of it. Right? Is the machine like actually making an error or is it just giving you a different perspective? And I think it's a little bit of both sometimes.

Jennifer Strong: But sorting photos is just the tip of the iceberg.

Gabi Zijderveld: So, if you could click on that…

Karen Hao: Ok.

Gabi Zijderveld: And it's best if you drive, because then you can try out the demo.

Jennifer Strong: Gabi Zijderveld is the chief marketing officer at Affectiva. She’s walking my colleague, Tech Review’s senior AI reporter Karen Hao, through a demonstration of one of their products that can read emotions.

Gabi Zijderveld: So this basically is a simplified version of how our technology would be deployed in media analytics and specifically in ad testing. It just more or less gives you a flavor of how we do this and what kind of things we can measure. So, as you can see, we have a few different ads or videos, preloaded, and then you can just pick one that jumps out for you.

Karen Hao: Pressing play…

Gabi Zijderveld: So as you're viewing this ad, as you can see, we're measuring your reactions to it.

Karen Hao: [Audible laugh]

Jennifer Strong: She’s watching a funny YouTube clip where two children and the tasks of being a working mother, among other things, interrupt a news program. It’s a spoof of the viral BBC interview - you know the one - where an adorable little girl waltzes into her dad’s home office.

Gabi Zijderveld: So basically in this demo, we asked you for permission to turn on your web camera. And as you started to play the video, basically frame by frame, our AI was measuring your reactions and your responses to what you were seeing in this ad. On the left hand side of the demo, you see all these different metrics, such as expressiveness, attention, disgust, smile… it shows you or it highlights for you where you had instances of brow furrow, which could be people kind of questioning or raising, raising their eyebrows if you will, literally. If you click on a smile curve, that could be interesting... a smile doesn't necessarily mean always that you're enjoying something, but here we know the context, we know this a humorous ad, and we can see from your curve that you were smiling a lot. So clearly this video was having its intended effect with you. And then if you click on views summary, it's kind of cool as well. This compares your data to everyone else that has viewed this video.

Karen Hao: Wow. I guess I was way more expressive than the average person.

Gabi Zijderveld: Yeah, yeah, exactly. Also your smile was way higher. Clearly you liked this video much better than the average.

Jennifer Strong: What they’re demonstrating here reads emotion and breaks it down into data for advertisers and ad agencies.

Gabi Zijderveld: That data again is very important insight because it basically helps them determine how effective their ads are and where to put their media spend.

Jennifer Strong: She says they need about a hundred tests to get enough data for an effective comparison. Affectiva has deployed this emotion AI technology all over the world with clients like Disney, Coca Cola, Kelloggs, Samsung and Google.

Gabi Zijderveld: About 28% of the fortune global 500 companies use our technology. Ad age has a list of the world's largest advertisers and 70% of them use our technology. We've tested over the years, more than 52,000 ads in 90 countries. So there's a lot of research being done. Let's say someone is testing a beer ad in the UK, and they would like to compare the performance of their beer ad to other beer ads in the UK, or maybe a beer ad in the US they can do that because we have so much data that kind of provides these norms.

Jennifer Strong: And after 10 years of refining emotion AI in media analytics… Affectiva is now moving into the entertainment industry.

Gabi Zijderveld: For example we have already done a number of studies where it's really about understanding how audiences engage and almost bond, if you will, with certain characters in TV programming. They introduce new characters. Sometimes those characters stick and audiences love them. Sometimes they don't. So they do a lot of research on that. And of course they try and predict what will be kind of the successful formula and doing that early on can save them a lot of money, getting it wrong is expensive in those cases. We've also done movie trailer testing that in itself a type of advertising content.

Jennifer Strong: But it’s the auto industry that Affectiva and many other companies are especially focused on.

Gabi Zijderveld: So aside from the safety application by understanding driver impairment which is basically a driver monitoring system, there's other applicability as well, because the moment you can understand what's going on with the backseat passengers, there's lots of other interesting things you can do. You can basically adapt the environment to the state of the person in the moment. So someone might be on their way to a work meeting in the future. Your vehicle might understand that because all the different systems are tied together. Maybe can look into your calendar if you give permission for that. And if you're on your way to a meeting, maybe you don't want music. Maybe you don't want to see video in the back. Maybe you want the right lighting in your region of the car. Maybe you need your seat positioned in a certain way. It could also be about basically improving the experience, right? Making it more fun and enjoyable or more restorative.

Jennifer Strong: It’s not on the market yet but she says they’ve tested elements of emotion AI in cars, including this:

Gabi Zijderveld: There was even a number of years ago, a research project that we did with Porsche, where we basically had our technology assess how people were reacting to music that was being played in the vehicle. And if they liked the music, it would basically adjust recommendations based on people's reactions and, and personalize it in that way.

Jennifer Strong: She estimates it’ll be 2 to 4 years before this type technology is on the road.

Another group working on this kind of emotion AI for vehicles is Harman, a company owned by Samsung.

Tim VanGoethem: We are building these algorithms that could use a camera image from a dash mounted camera and we're building the software that can look at your eyes or other facial features and be able to infer the driver's state.

Jennifer Strong: That’s Tim VanGoethem, the company’s head of advanced mobility.

Tim VanGoethem: So we could do things like understand your heart rate, just by looking at how the pigmentation of your skin changes each time your heart beats. We can look at characteristics of how your eyes move and, and based on that can correlate that to whether you're distracted or you're drowsy or you're stressed or your mind doesn't seem to be focused… And so, understanding how to make that meaningful. If you’re tired can we make simple adjustments in the cabin temperature? Can we change the seat position subtly? Pick a mix of music that's maybe a little more uptempo as opposed to what you're currently listening to. So, it could be the combination of some very simple adjustments to some very big steps, depending on what we're able to infer your condition is

Jennifer Strong: So how much further could companies like this take the technology?

Tim VanGoethem: Where that could go in the future is interfacing to wearables that you bring into the car. A lot of people love their smartwatches and quite often those smart watches have a fitness or wellness capability as well. So the car in the future would be able to bring information from those wearables in. So instead of trying to use a camera to infer or deduce your heart rate, we could actually use the sensor from the smartwatch itself. And then the smartwatch with the user's permission could share that information with the car. And then obviously it's, it, it might be a more precise signal. The step beyond that is we could also tap into even bigger ecosystems. So as an example, if the car and the algorithms in the car knew that you had had a poor night's sleep before you got into the car we could make some prior decisions knowing you’re already coming into the car potentially tired.

Jennifer Strong: During the next 2 to 5 years he’s going to be trying to answer this question.

Tim VanGoethem: How does the car not just solve problems for people while they're in their car, but how can they plug into these bigger ecosystems? And, and I think that that will evolve over time as people understand the relationship of how does my life outside of the car and my life inside the car naturally blend together.

Jennifer Strong: As for how comfortable drivers will be allowing their car to read their emotions and respond and even connect to their life outside the vehicle? Well, we’re already letting machines do this in our most intimate spaces...inside our home.

Rohit Prasad: Just as we do as humans - we cherish interactions with people who are humble, helpful and relatable and trustworthy. And of course you want some kind of fun as well in the personality so that you can be very engaging.

Jennifer Strong: Rohit Prasad is the chief scientist behind Amazon’s Alexa. When the voice assistant was released six years ago, it represented the dawn of a new kind of relationship with personal technology. For it to work, he knew they’d have to create a personality that would work for different people in different spaces.

Rohit Prasad: We believe trust is not won just by how you sound, but what you say and what you do. We wanted people like us to be comfortable talking to an AI in their homes… And our homes are a communal environment. It's not just a personal device like your smartphone. And in that setting Alexa is going to interact with me, my wife, my kids. Right? So now you have to make the voice work for all these environments.

Jennifer Strong: They knew cultivating trust with users would also rely on Alexa’s ability to recognize and respond to people’s emotions.

Rohit Prasad: When customers are happy or excited, Alexa should mimic that behavior. And when the customer is disappointed, Alexa should take a more of an empathetic tone.

Jennifer Strong: These days Alexa can do some of the basics.

Rohit Prasad: So some of the emotional responses you can see is when you're asking about your favorite sports team and if they won, Alexa will be more jovial in the response.

Jennifer Strong: Plus, Amazon is continuing to work on its listening skills.

Rohit Prasad: So can Alexa sense your vocal frustration and alter her responses to you? So that Alexa picks up your vocal frustration and adapts her responses in the right way to give you what you need as a customer instead of just frustrating you more by giving the same response.

Jennifer Strong: And Alexa can already do a version of mirroring its user:

Alexa recording: Let me tell you a secret. I can whisper.

Rohit Prasad: This is what we call a whisper mode. This happened when one of my earlier bosses at Alexa came home. He whispered to Alexa and Alexa responded and woke up his wife.

Jennifer Strong: He says it inspired them to teach Alexa to whisper because if you whisper to a human, that person usually whispers back. They’re also teaching it to reply as an expert, and when to know that it isn’t. He says it depends on the type of questions.

Rohit Prasad: In certain settings when it's very sensitive topics that you may share with a companion, but Alexa may not be the expert at it, for instance, mental health. Or any other health issues. There, I think our guiding tenet is always to get the expert help in those settings and I think even though you're sharing it as a companion there's a huge responsibility for the AI at that point.

Jennifer Strong: As Alexa gets to know a it's users it could play an increasing role in their lives.

Rohit Prasad: I think that relationship keeps growing from more of an assistant to advisor and even companion for people. And we are seeing that happen in today's time. And I think as you grow with Alexa the relationship evolves.

Jennifer Strong: But he cautions:

Rohit Prasad: These are still early days and we will continue to do the research to figure out what is the best emotive response or stylistic response based on the customer input. But I just want to make sure that we're on the same page that it is a pretty hard problem because you don't want any mistakes to amplify and make customers lose trust with Alexa. In fact, you want more trustworthy action, which means it's probably okay to have more neutral responses in certain settings. And that's what we are working on right now as well.

Jennifer Strong: Emotion AI can enhance our interactions with technology but depending how it’s used, it can also cause real harm. We’ll dig into some examples of that right after the break.

[midroll ad]

Jennifer Strong: The pandemic has seen a spike in the use of digital programs that supervise school exams. These products use behavioral tracking and recognition to watch students through their laptops and look for signs of cheating. The New York Times spoke to a college junior who suffers a facial tic disability, and a recent Afro-Latina law graduate who struggled for four hours just to get the software to register her face. These problems have also cropped up in interviewing software which uses similar face and behavior tracking to help decide whether you’re worthy of being hired.

Meredith Whittaker: The idea that people's access to jobs and opportunity are being shaped by what could be considered kind of stereotypes and assumptions about how they look, how they act, whether they speak in a way that is commensurate with some or another model of success is incredibly troubling.

Jennifer Strong: Meredith Whittaker is the co-founder of the AI Now Institute at NYU.

Meredith Whittaker: And we study the social implications of artificial intelligence.

Jennifer Strong: Beyond just labelling emotions she says this is really about interpreting the value of those emotions and automating decisions with that information.

Meredith Whittaker: This is about corporate power and the way in which these companies are producing technologies that make fantastical claims almost always hidden behind veils of trade secrecy. They are unaudited. They are unexamined. We're seeing technology that claims to be able to detect people's interior character, their competence, their feelings, being deployed to inform decision making in the classroom, who to hire and who to promote. We're seeing it deployed in the criminal justice system - in places that are really shaping people's lives and access to opportunities. And given that there is no scientific consensus on the efficacy of automating these kinds of claims, we thought it was time to call for a ban on using these technologies in these domains.

Jennifer Strong: And she says what’s perhaps most troubling is how often these technologies are used without the people subjected to them having any idea, or recourse.

Meredith Whittaker: It's very difficult to know if I, if I'm not hired for a job and I was interviewed by HireVu, was it because the software is biased or because I wouldn't have gotten the job anyway. Right? Like, a sample of one is difficult to draw a conclusion from. And the data that would allow us to detect a pattern of discrimination or harm or error or what have you, is not in the public domain. The folks who have access to that data are HireVu and then whatever company is licensing HireVu to use it for interviews, right? And neither of those companies have real interest in allowing the public and lawmakers and advocates to comb through that data.

Jennifer Strong: And so her group is calling for a ban on emotion AI in sensitive use-cases, which would include applications that may not seem so sensitive - like a beep from the dashboard of a car when it thinks a driver might be getting drowsy.

Meredith Whittaker: We sort of have to back up here and, and look at kind of the power dynamics here, right? Perhaps a car that blinks a little light when it thinks that you're drowsy based on how your face looks might not be that harmful. But is that information sent to your insurance company? Is that information used to set your insurance rates or to adjudicate your fault if you get into an accident? Does it prevent Lyft or Uber drivers from signing onto their app? What are the uses of this data?

Jennifer Strong: What she's saying might sound overly cautious, but last year New York’s top financial regulator ruled that life insurance companies can set premiums in that state based on information from social media. So what's to stop data from cars—or anything else—being used to set your insurance rates in the future? Right now, in most places nothing.

Jennifer Strong: This episode was reported and produced by me and Karen Hao, Tate Ryan-Mosley and Emma Cillekens. We had help from Benji Rosen. We’re edited by Michael Reilly and Gideon Lichfield. Thanks for listening, I’m Jennifer Strong.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.