In the past, hiring decisions were made by people. Today, some key decisions that lead to whether someone gets a job or not are made by algorithms. The use of AI-based job interviews has increased since the pandemic. As demand increases, so too do questions about whether these algorithms make fair and unbiased hiring decisions, or find the most qualified applicant. In this second episode of a four-part series on AI in hiring, we meet some of the big players making this technology including the CEOs of HireVue and myInterview—and we test some of these tools ourselves.
Kevin Parker, Chairman & CEO, HireVue
Shelton Banks, CEO, re:work
Mark Adams, Vice President of North America, Curious Thing AI
Benjamin Gillman, Cofounder and CEO, myInterview
Fred Oswald, Psychology Professor, Rice University
Suresh Venkatasubramanian, Computer Science Professor, Brown University
This miniseries on hiring was reported by Hilke Schellmann and produced by Jennifer Strong, Emma Cillekens, Karen Hao, and Anthony Green with special thanks to James Wall. We’re edited by Michael Reilly. Art direction by Stephanie Arnett.
Jennifer: Work… is a big part of our lives. It’s how most of us pay our bills, feed our families… and put a roof over our heads.
Michelle Rogers: “A permanent job would mean stability. You need something to keep you going and to keep you fresh.”
Dora Lespier: “Like being able to take my daughter being able to get whatever she needs. It would be amazing.”
Henry Claypool: “You know, it’s, it’s a big part of my identity. It’s what I do a lot. And I enjoy trying to make the world a better place through my work.”
[Upsot.. chorus.. “working 9 to 5”]
Jennifer: In the past we left hiring decisions… with people. These days some of those key decisions that lead to whether someone gets a job or not are made by algorithms… which at least in theory could be more objective than humans.
Anchor 2: Well, it’s possible. Some companies are now using artificial intelligence to help hire employees.
Jennifer: I’m Jennifer Strong and in this second episode of our series on AI and hiring/we look into the rise of AI in job interviews… Just like software we heard about last episode that decides whether a résumé reaches a human… this software helps decide which interviews reach a hiring manager… and it’s completely changing how the interviewing process works.
Gilman: And one of the things candidates have to do now is, is tons of assignments, psychometric tests, heaps of interviews, and it’s bloody frustrating. And to be honest, it’s pretty black-boxy. You don’t know what results you’re getting. You don’t know how people are viewing you… all of these types of things.
Jennifer: Machines are scoring people on the words they use, their tone of voice—sometimes even their facial expressions. We decided to test some of these tools ourselves… and we found rather unexpected things… like that some of the interviewing systems we tried? Don’t necessarily consider people’s answers to the interview questions.
Hilke: So, you’re saying it didn’t take the transcript at all into consideration. Just the intonation of my voice. And for some reason I scored a 73 percent match with the role.
Clayton: Yeah. Well done [laughs]
Jennifer: Some of these tools didn’t even consider whether the interview questions were answered in the correct language…
[Woman speaking in Mandarin]
Fred: Wow. That’s even more shocking. I would argue, you know, at least with German, maybe there are some cognates that look somewhat similar, but for Mandarin, I can’t imagine how that could be reliable, let alone lead to a high score.
Jennifer: The question is whether these are challenges we can overcome… or whether it’s a sign of a deeper problem.
Suresh: Should we be making better AI systems for hiring, or should we be trying to essentially bring down the entire enterprise?”
Jennifer: Remote job interviews without a human on the other end got a huge boost during the pandemic. In these one-way interviews, every person applying for a position has to answer the same pre-recorded questions. And applicants record their answers on their own device. By far the largest player in this space is a company called HireVue. Its customers include more than a third of the Fortune 100… and it’s used by brands like Unilever, JP Morgan Chase, Delta Air Lines, and Target.
Kevin: My Name’s Kevin Parker… and I’m the CEO at HireVue based in Salt Lake City.. Utah
Kevin: HireVue is a 15, 16-year-old company whose primary focus is democratizing hiring. We do that today in over 40 languages and over 180 countries around the world. And the primary way we do that is through on-demand interviews that candidates can interview for jobs any hour of the day, any day of the week. We interviewed nearly 6 million people for jobs last year for our customers. About half of those were for hourly workers and about half of those were professional.
Jennifer: HireVue’s clients are not looking to fill a couple of open positions… but rather, they’re often trying to interview thousands of people at once. When I spoke to Parker earlier this year, he gave an example of a customer who was interviewing 50 thousand people for jobs in 15 hundred locations… over a weekend.
And he believes the way his company does this… is more fair than the way most humans conduct job interviews.
Kevin: Structured interviewing is the most important way to hire: ask every candidate the same question, ask it the same way, make sure it’s related to work and the skills that they have… And so it’s the ability to deliver structured interviewing at scale that really matters. And we can do that with, uh, with video. And so people can record questions that could be reused multiple times.
Jennifer: to kinda peel that back again – it makes it sound like it’s just a video recording of a person asking a question, but there’s, there’s more to it than that. Your company is also processing what’s happening on the other end.
Kevin: What we’re really looking at there is the words the candidate is using to describe their team orientation or their ability to work independent or their problem-solving skills. So we can assess individual competencies for candidates, and we can use those algorithms to understand the answers that they’re giving.
Jennifer: HireVue’s algorithms are trained on top, middle, and low performers and are looking for the differences between them. The algorithms then compare new video interviews of job applicants against that data.
Unlike some other vendors, the company’s AI does analyze the actual content of what people say… unless a client chooses not to use that feature. It also tries to examine other cues in their voices… and, until recently, claimed it could find meaning in people’s facial expressions… which is incredibly controversial and often criticized.
The company says the small value the facial expression analysis added wasn’t worth the criticism it attracted… and so they’re phasing it out.
But despite the controversy… the use of HireVue’s tools continues to grow … and it’s being used in some unexpected ways.
Shelton: I’m using real live humans, and that’s not dependable… so let’s give this a shot and see what it does.
Jennifer: Shelton Banks is the chief executive of Rework… a nonprofit, which gets free access to HireVue.
Shelton: We were skeptics, like most people were—like, man, this isn’t gonna work. This is going to be biased. This is gonna be a danger here. And we’ve realized that there’s danger everywhere… there’s going to be bias everywhere.
Jennifer: His organization aims to help people from diverse backgrounds move up in the workforce and into better-paying jobs.
Shelton: The demographic that I serve is 95% Black and brown, but we’ve been training candidates from what we like to call untapped and overlooked communities for the last five years and been helping people get jobs in the tech sales industry.
Jennifer: The training includes how to do a job interview (with people or with AI). These days he also uses AI-based interviews to select people for the program, but when it first started he used volunteers to scrutinize applicants…
Shelton: Like, we have all these volunteers that want to help and assist us. And so we put together a rubric of questions, behavioral questions that you typically get asked at an interview. And we say on a scale of 1 to 5 how well did they do answering this question, give them feedback.
Jennifer: But the scores the volunteers gave were inconsistent… and often too good to be true.
Shelton: Like, tell me about a time you failed at something. And I had a response go: “I’ve never failed at anything.” Uh, I just like took the glasses off and said, uh, that was horrible. [laugh] Like, what do you mean? You never—like, you are unemployed right now. Like, you have challenges, sir. Like, come on. Like, but interestingly enough, a volunteer would say like, okay, like, man, that’s awesome. Like, man, you—you’re confident. But then I will put the same candidate in a, in a real interview. And then they wouldn’t get a job and be wondering why.
Jennifer: So some people who scored highly in their entrance interviews turned out to be more challenging to teach.
Shelton: We would invite these high-scoring people into the program and then throughout the eight weeks, we would just run into roadblocks. Like, man, you are rough around the edges. Like, I got you for 8 weeks. Like who, who let you, who gave you this gold star… and then HireVue comes along and says, Hey, we got this AI tool. And, uh, I’m like, I don’t want it, like, what’s it going to do? And they’re like here it is. It, uh, it helps you, you know, give a person’s score, give a percentage of how well a person interviews.
Jennifer: The algorithm scores people based on different traits that job candidates probably need in a tech sales job.
Shelton: …And we said like, Hey, everybody take this HireVue assessment. They’re going to record you. We’re going to use AI. And of course the results were kind of all over the place. So I got people scoring 99%. I got people scoring 5%.
Jennifer: So, he ran a series of tests….… just to see what would happen.
Shelton: And so then the next cohort, I said, all right, well, I’m only going to take the people at the top tier, you know, that HireVue said were the best. We’re only going to take them and I’m going to train them. Cohort of 10 folks, I want to say seven of them got jobs. It was, like, difficult for me to get the seven people, jobs that HireVue said were the best. So now I’m going to do the exact opposite. I’m going to get 10 people that are at the bottom and see if I can help give them jobs. Worst cohort of my life. Eight weeks of people that they learned, they grew, but none of them got jobs.
Jennifer: He took a closer look at the results…. And he started noticing some patterns… like that HireVue gives better scores to people who sound convincing… regardless of what they actually say.
Shelton: They talked and the tone and pitch and pace was on point and understandable. But it was like they had no context. It was just kinda like, everything sounded great but, you know, didn’t answer the question necessarily. They just sounded great.
Jennifer: But what happened next surprised him.
Shelton: I would train people and put them in front of a real person. And it was just like, people started to get jobs.
Jennifer: In other words… HireVue seemed to be over-indexing on a candidate’s delivery over content… but so were real hiring managers.
Shelton: People started to get jobs left and right off just like that little piece. You know, it changed the way we recruit and changed the way we train.
Jennifer: These days… Banks uses HireVue to recruit people into the program and to train them to get jobs in the industry.
He doesn’t just take high-scoring candidates, but instead has come up with his own way of doing things, with a mix of high performers, middle-tier applicants, and then 10% coming from the bottom.
Shelton: You get this diversity of thought and diversity of experience in the cohort, which makes for, man, great cohorts.
Jennifer: He believes combining his judgment with AI… helps him make the best decisions about who to choose for his program.
Shelton: You take the HireVue. I see you’re in the bottom bucket and it’s like, man, I want to help you. This is going to be rough. But sometimes, you know, you know, like, uh, they prove HireVue wrong. [breath] You shouldn’t always listen to the tool. you shouldn’t always listen to the tool. But the tool will help you make an informed decision.
Jennifer: But are we really making informed decisions when we auto-score job interviews?
Hilke Schellmann is our reporting partner on this series…
Hey Hilke… You’ve been deep in researching these interviews. Tell us what you found.
Hilke: Yeah. So we found that… video interviews are controversial. We don’t know yet how good AI is in auto-scoring the content of these interviews. Does a computer really “understand” our answers? Can it analyze the many ways humans talk about teamwork, for example?
Video interviews are also controversial, because some of them use AI to analyze job applicants’ facial expressions and the tone of their voices to predict if someone will succeed in a given job. So this raises a few questions: How good is the software in analyzing the words we say? In reading the facial expressions on our faces and intonation in our voices? And also, what facial expressions or tone of voice does one need for a given job? Is this even relevant to a given job?
Jennifer: Yeah, and scientists have repeatedly questioned how good these automated facial and emotion readers are… even… whether it’s something that’s possible to do at all… So, what exactly are these algorithms predicting… and can we see under the hood, so to speak?
Hilke: Sometimes yes… sometimes no… Very few people are actually getting access to see how these black-box algorithms work…. and there’s a lot of concern about these tools in theory, but there isn’t a lot of insight into the tools themselves. That’s why we wanted to try to take a closer look.
Jennifer: So… Altogether, we looked at seven tools with up to five people test-driving each one. Were there any that stood out to you?
Hilke: Yeah… one really did stand out to me. It’s called Curious Thing AI. It’s an AI phone interview platform.I actually found out about this company at a recent HR tech conference.
Mark: Hey, everyone. Good morning. Welcome to our demo.
Jennifer: Mark Adams is its vice president of North America.
Mark: …Curious Thing… for those of you who don’t know us, we are a conversational AI voice interview solution. Essentially, your candidates will do an interview with a voice AI.
Digital Interviewer: Welcome to the digital interviewer hosted by Curious Thing AI. My name is Christine. Thank you very much for joining me today….
Mark: We work especially well when applied to high-volume recruiting scenarios. When you’ve got to hire a lot of people in a short space of time, we can really streamline that process. We use a bunch of really interesting AI technologies like natural language processing, knowledge graph, deep transfilling, and we have clients right now here in the US, in our home country of Australia, Philippines, New Zealand, and Singapore.
Jennifer: Okay, so the AI is designed to help hiring managers pick out the right people for a job amongst hundreds or thousands of applicants.
Mark: So let’s look at this candidate here, uh, at the top and see what the AI has actually done in terms of the interview. The recording of it is here and I can play it back if I choose to, but generally that’s not the best use of the time because the whole point of it is to reduce your screening time and just focus on candidates you want. So you can listen to it, but we don’t recommend that you do that.
Hilke: So, this was pretty telling to me. Usually a lot of these companies tell me that their AI is just one data point amongst many to make a hiring decision. And hiring managers should really watch or listen at least to some parts of the interviews… which in my mind defeats a little bit the purpose of using an automated tool to make hiring easier… but here Mark Adams spells it out to hiring managers: Do not spend time on the recording.
Jennifer: He also says accents don’t matter.
Mark: our AI is completely resilient to accents or stutters or any kind of, um, sort of verbal tick that, um, you know, might cause a human, a recruiter to think one way or another about the candidate. So the AI is actually listening to the spoken word, and then it’s being streamed into text in real time. And the analysis is actually just done on the text of the conversation.
Jennifer: So Hilke wanted to test it out… since her first language is German.
AI Interviewer: Please remember. I don’t think there are right or wrong answers here. Let’s start. Tell me about a tough work situation you have gone through. What did you do and what was the outcome?
Hilke: I once had a boss… who was a micromanager…. and that was very hard to deal with because she would second guess everything.
Jennifer: So you got an expert level score – then you tested it again, speaking only in German… And what were you expecting to see happen there?
Hilke: I believe in stress testing these systems to understand how the scoring really works. And… I’ve talked to many vendors and they’ve told me that if someone has issues speaking or there was another problem with the detection of a person’s voice, the software would recognize the problem. HireVue for example told me that there is a minimum threshold that a candidate needs to meet for the system to score them.
Jennifer: Got it. So you thought you would just get an error message or something when you answered the questions in German instead of English?
Jennifer: What actually ended up happening?
Hilke: So, it assessed me on me speaking German but gave me a competency score English score. So… I was scored six out of nine… and my skill level in English is competent.
Jennifer: That’s wild. So… you only spoke German, but the software said you were competent in English?
Hilke: Yeah…. I was confused too… So I redid the experiment.. Same result.
Hilke: So I did a similar experiment with myInterview, which we talked about earlier. It’s a video interview tool in which the algorithm analyzes the words I say and my tone of voice. It then rates how good of a fit I am for the role.
Jennifer: And just for some context – the companies we’re talking about here are much smaller than like say HireVue – but they aren’t tiny either. These tools are used by millions of people. This particular company, myInterview, was founded in Israel and also operates in the U-S, U-K and Australia.
Benjamin: So our customers are typically, um, small, medium enterprises. Diverse 4,000 companies use our platform from a myriad of different industries. We’ve got a very large candidate pool, um, over 3.4 million candidate interviews through the, through the site.
Jennifer: Benjamin Gillman is the company’s chief executive. He spoke to us back in April… and said they only need 30 seconds of audio to give an insight into a candidate’s personality. Plus.. he said the tool works with different accents.
Benjamin: The error is, is quite negligible. The insights we’re giving could be a 0.2% change, maybe in the assumption that this person is outgoing. Because were overlying overlaying tone on top of text. We’re able to mitigate a lot of that because tone portrays a lot where sometimes a language is deficient.
Jennifer: It seems almost magical to pull full-blown personality profiles out of 30 seconds of audio and text, but Gillman says his team is working to keep their tools from being black boxes.
Benjamin: Our goal is to be very transparent in this and to really communicate exactly what’s happening and how it’s happening and, you know, how the machines are working. We aren’t looking to say, this person is the right hire. All we’re trying to do is help with search. It’s not a system that you can game. It’s not a, uh, it’s not something that’s going to discriminate against you.
Jennifer: Right. So Hilke you also tried speaking German to myInterview… and you got a score there too.
Hilke: I did get a score, but first I got a transcript of what I said.
Jennifer: Ok, and this is what it interpreted your words to be… so I’m just going to read from this transcript here….. Which doesn’t exactly make a lot of sense
So humidity is desk a beat-up.
Sociology, does it iron?
Mined material nematode adapt.
Secure location, mesons the first half gamma their Fortunes in
Hilke: Yup, apparently that’s part of the answer I gave to the first question where I had to tell the machine about myself. And… as you heard…. It’s gibberish.
Jennifer: Ok what results did you get from this?
Hilke: I was scored a 73 percent match for the role although I didn’t speak a word of English and the things I said in German had nothing to do with the questions I was asked or with the job itself.
Jennifer: .. because you were reading German off a Wikipedia page.
Hilke: Yeah… actually 73% is pretty high. And I wanted to make sure this language discovery was not just based on me speaking German…. so… one of the graduate students who I work with was kind enough to record herself in Chinese reading the same Wikipedia text – she scored 80% – And her English transcript is gibberish just like mine.
Jennifer: Right… bringing us back once again to this question of… Are these machines making decisions based on scientific evidence or are they just guessing?
Hilke: Yes… That’s the elephant in the room.
Jennifer: Alright so we’ve reached back out to these companies and we’ll report anything we learn later in this episode… for his part, the company’s CEO told us back in April that he’s receiving very good feedback on this product from customers.
Benjamin: We see that they are hiring people that they might not have considered previously, and that are you know very good fits for their companies and it’s hopefully, uh, uh, less painful and more informed process.
Jennifer: All this makes me really wonder what would happen if researchers ran a lot of tests and scenarios over a longer period of time… what might they find?
Hilke: I would love to see their results.
[quick music beat]
Jennifer: We’ll be back … in a moment
[quick music transition]
Jennifer: To some… the results we found might not be not surprising. Many have called for a ban on AI in hiring…saying the flaws in these tools, (that millions are using to try and land a job), are just not redeemable…
Other experts say such a decision would be hasty and uninformed…sure, there are kinks — but the promise of conducting fairer interviews at scale is too great to let go.
So… we called up an expert ….
Suresh: My name is Suresh Venkatasubramanian.
Jennifer: He’s a computer scientist and a professor at Brown University.
Suresh: Should we be making better AI systems for hiring, or should we be trying to essentially bring down the entire enterprise? It’s this tension I think between, I think what people have called the abolitionist and the incrementalist viewpoint is sort of at the heart of literally every day when I think about these things.
Jennifer: He’s also an AI ethicist who used to serve on HireVue’s Advisory board.
Suresh: And there’s no single answer to these questions, I think it depends on the circumstances. So I think the truth is that for a while. I thought that it made sense to try and change things from the inside, and at some point when you feel that maybe that’s not going to happen, then you don’t feel like you’re being effective anymore. And then I felt, okay, it was time to maybe not be there.
Jennifer: He was at HireVue still selling facial expression analysis to customers. He has doubts these technologies are backed by solid science. And he shared his concerns with the company.
Suresh: Initially the reaction was, okay, Let’s not, you know, immediately stop doing this, but let’s look into this more. Let’s be careful. Okay, fine. Let’s do that. So then you wait and then it comes up again and then you see it again. And at some point you realize that they probably weren’t going to stop doing it. And it didn’t matter what I said.
Jennifer: When HireVue announced it was phasing out the use of this tech… in response, he tweeted,: “It’s about time. I used to work with HireVue on their issues around bias and eventually quit over their resistance to dropping video analysis.”
Jennifer: And.. he says companies often hide behind claims that their AI is accurate…but that isn’t the full picture.
Suresh: Saying that your system is accurate, merely means that your system matches what the training data says it should do. And so then there’s also the question of, well, is the training data accurate, which leads you down rabbit holes that often people don’t want to go down.
Jennifer: In other words, he says he got more and more convinced that some vendors in the industry don’t really want to know what their AI is predicting upon.
Suresh: They can put all the guardrails around it, but they have to sell a product. And if your marketing pitch involves a certain tool, an AI tool, a scalable tool, that’s the thing you’re selling. You’re not going to stop selling it. The question is always, if at some point someone says, you shouldn’t do this. Are you going to stop doing it or not? And if you’re not, then there’s no longer a place for the person who says you should not be doing this.
Jennifer: It made him question whether self-regulation is even possible for many companies… and what it is he’d like to see.
Suresh: I would like to see the industry to be more honest, I think, and reflective about what is it they’re trying to solve here. I honestly think that all of this is really about scale, which itself is not a bad thing but there are consequences of going with systems that scale like that.
Suresh: The underlying assumptions about what is valued and what is not are the key here. And so I’m often a little skeptical when I hear well, we want to build us to be more fair. Sure. But within the context of scale, being your primary goal.
Jennifer: After we taped this interview in February, Venkatasubramanian (VEN-Ka-SuE-BRa- MAN-EE-An) accepted a position in the Biden administration, with the White House Office of Science & Technology Policy.
Jennifer: Before we go… we want you to know we told the companies behind the systems we tested about what we did… and about our results. Curious Thing AI thanked us for testing their system… myInterview did too… and they also got on the phone with us to talk about what we found.
Clayton: It knows that you weren’t speaking English for sure.
Jennifer: Clayton Donelly is myInterview’s industrial-organizational psychologist.
Clayton: Um, but then it defaults to audio only then it won’t use your content because the content you can see, it’s like, um, it’s random, it’s random nonsense. So it won’t, it won’t read that at all, unless we tell it to do that.
Hilke: So, you were saying it didn’t take the transcript at all into consideration. Just the intonation of my voice. And for some reason I scored a 73% match. Like my, the intonation of my voice was a 73% match with the role.
Clayton: Yeah. Well done [laughs]
Jennifer: Ok, so he says the system “understood” that you weren’t speaking English…
Hilke: But if the system scored me on the intonation of my voice alone, I still don’t understand how it could pull a full personality analysis on me, showing for example that I am way more innovative than consistent. How do you find that in the sound of my voice?
Jennifer: So, we shared these findings with psychology professor Fred Oswald at Rice University, who does research in artificial intelligence and hiring.
Fred: If information is being pulled from a video interview it’s definitely important, I would say a responsibility, to understand whether what is being measured is job relevant – what is being measured is understood fairly across applicants, no matter what your background is.
Jennifer: And we played him our tests in German… and Mandarin.
Fred: Wow. That’s even more shocking. I would argue, you know, at least with German, maybe there are some cognates that look somewhat similar, but for Mandarin, I can’t imagine that the score is… at least the way you showed the text, how that could be reliable, let alone lead to a high score.
Jennifer: Basically… he doesn’t think some of these tools should be used to make high-stakes hiring decisions.
Fred: Intonation… I, I don’t, I don’t see much research evidence, um, maybe more work needs to be done. Right? What are the situations where intonation would provide any job relevant information? I think the argument currently has to be that we, we really can’t use intonation as data for hiring that just doesn’t seem fair or reliable or valid.
Jennifer: So, he wants scientists to take a closer look at these tools.
Fred: We want to encourage innovation, but research needs to continue to catch up and gather data on how reliable are these scores? Like what evidence tells you that whatever is being measured reliably and fairly is actually relevant to an employee’s success in the organization, which is good for the organization. That that’s why they’re presumably paying for these tests, but also good for the, for the job applicant, because you want to be doing the right thing and getting rewarded for your performance.
Jennifer: Next episode…. we turn our attention to A-I games that are used to evaluate potential employees…
Anonymous Jobseeker: Not everyone thinks the same. So how are you inputting that diversity and inclusion when you’re only selecting the people that can figure out a puzzle within 60 seconds?
Jennifer: We present that criticism to someone who designs these games.
Matthew Neale: You know, the disconnect, I think for this candidate was between what the assessment was getting the candidate to do and… this is why it’s relevant… and this is why we’re using it in this particular job.
Jennifer: AND… in part three of this series, we’ll also take a closer look at why the uses of AI in hiring aren’t really regulated.
Jennifer: This miniseries on hiring was reported by Hilke Schellmann and produced by me, Emma Cillekens, Karen Hao and Anthony Green with special thanks to James Wall. We’re edited by Michael Reilly.
Thanks for listening… I’m Jennifer Strong.
Your daily newsletter about what’s up in emerging technology from MIT Technology Review.