Lie detectors have always been suspect. AI has made the problem worse.

An in-depth investigation into artificial-intelligence-based attempts to recognize deception.

Jake Bittlearchive page

March 13, 2020

Nicolás Ortega

Before the polygraph pronounced him guilty, Emmanuel Mervilus worked for a cooking oil company at the port of Newark, New Jersey. He was making $12 an hour moving boxes, but it wasn’t enough. His brother and sister were too young to work, and his mother was fighting an expensive battle against cancer. His boss at the port, though, had told him he was next in line for promotion to a technician position, which would come with a raise to $25 an hour.

Mervilus was still waiting for this promotion on October 19, 2006, when he and a friend stopped at a Dunkin’ Donuts in nearby Elizabeth, New Jersey. A few minutes later, as they walked down the street, two police officers approached them and accused them of having robbed a man at knifepoint a few minutes earlier, outside a nearby train station.

The victim had identified Mervilus and his friend from a distance. Desperate to prove his innocence, Mervilus offered to take a polygraph test. The police agreed, but in the days right before the test, Mervilus’s mother died. He was distraught and anxious when the police strapped him up to the device. He failed the test, asked to take it again, and was refused.

After Mervilus maintained his plea of innocence, his case went to trial. The lieutenant who had administered the polygraph testified in court that the device was a reliable “truth indicator.” He had never in his career, he said, seen a case where “someone was showing signs of deception, and [it later] came out that they were truthful.” A jury convicted Mervilus—swayed, an appeals court later found, by misplaced faith in the polygraph. The judge sentenced him to 11 years in prison.

The belief that deception can be detected by analyzing the human body has become entrenched in modern life. Despite numerous studies questioning the validity of the polygraph, more than 2.5 million screenings are conducted with the device each year, and polygraph tests are a $2 billion industry. US federal government agencies including the Department of Justice, the Department of Defense, and the CIA all use the device when screening potential employees. According to 2007 figures from the Department of Justice, more than three-quarters of all urban police and sheriff’s departments also used lie detectors to screen hires.

But polygraph machines are still too slow and cumbersome to use at border crossings, in airports, or on large groups of people. As a result, a new generation of lie detectors based on artificial intelligence have emerged in the past decade. Their proponents claim they are both faster and more accurate than polygraphs.

In reality, the psychological work that undergirds these new AI systems is even flimsier than the research underlying the polygraph. There is scant evidence that the results they produce can be trusted. Nonetheless, the veneer of modernity that AI gives them is bringing these systems into settings the polygraph has not been able to penetrate: border crossings, private job interviews, loan screenings, and insurance fraud claims. Corporations and governments are beginning to rely on them to make decisions about the trustworthiness of customers, employees, citizens, immigrants, and international visitors. But what if lying is just too complex for any machine to reliably identify, no matter how advanced its algorithm is?

Inquisitors in ancient China asked suspected liars to put rice in their mouths to see if they were salivating. The Gesta Romanorum, a medieval anthology of moral fables, tells the story of a soldier who had his clerk measure his wife’s pulse to figure out if she was being unfaithful.

photograph of Gesta Romanorum book cover — An English translation of the *Gesta Romanorum*, or Deeds of the Romans, a collection of stories originally published in Latin in the late 13th or early 14th century.
Wikimedia commons

As the United States entered World War I, William Marston, a researcher at Harvard, pioneered the use of machines that measured blood pressure to attempt to ascertain deception. A few years later, inspired by Marston’s work, John Augustus Larson, a police officer who had just completed his PhD in physiology at the University of California, Berkeley, developed a machine he called a “cardio-pneumo psychogram,” which provided continuous readings of a subject's blood pressure, pulse, and breathing rate. These readings, Larson claimed, were an even better proxy for deception than blood pressure alone.

Larson first used the machine to investigate a theft in a women’s dorm at Berkeley, and within a year, it had been used to convict a man in San Francisco who was accused of murdering a priest. By the 1930s, one of Larson’s protégés was selling a portable version to police departments around the country, adding a sensor that measured changes in galvanic skin response—the more a subject sweated, the more conductive the skin would be. By the 1970s, millions of private-sector workers were taking regular polygraph tests at the behest of their employers.

Most polygraph tests today have the same basic structure as Larson’s: the examiner asks a series of questions to measure a subject’s normal physiological state, watching while the machine transcribes these measurements as waveform lines on a page or a screen. The examiner then looks for sudden spikes or drops in these levels as the subject answers questions about suspected crimes or feelings.

But psychologists and neuroscientists have criticized the polygraph almost since the moment Larson unveiled his invention to the public. While some liars may experience changes in heart rate or blood pressure, there is little proof that such changes consistently correlate with deception. Many innocent people grow nervous under questioning, and practiced liars can suppress or induce changes in their body to fool the test. Polygraphs can also be beaten by biting one’s tongue, stepping on a tack, or thinking about one’s worst fear. The devices always risk picking up confounding variables even in controlled lab experiments, and in real life they are less reliable still: since criminals who beat the test almost never tell the police they were guilty, and since innocent suspects often give false confessions after failing the tests, there is no way to tell how well they actually worked.

English: American inventor Leonarde Keeler (1903-1949) testing his lie-detector on Dr. Kohler, a former witness for the prosecution at the trial of Bruno Hauptmann. — Leonarde Keeler, a protégé of polygraph inventor John Larson, administered the test to Bruno Hauptmann, who was arrested, convicted, and executed for the kidnapping of Charles Augustus Lindbergh Jr. Hauptmann maintained his innocence until his death.
Public domain

Because of these limitations, polygraph tests have long been inadmissible in most American courts unless both parties consent to their inclusion. Federal law has barred private employers from polygraphing their employees since 1988 (with exceptions for those in sensitive jobs, like armed guards or pharmaceutical distributors, and for some employees who are suspected of stealing or fraud). The American Psychological Association cautions, “Most psychologists agree that there is little evidence that polygraph tests can accurately detect lies,” and a 2003 report from the National Academy of Sciences, echoing previous government research, famously found that the device detects liars at rates “well above chance, though well below perfection”; the report’s lead author said at the time that “national security is too important to be left to such a blunt instrument.”

But perhaps the instrument need not be so blunt. That’s the promise being made by a growing number of companies eager to sell lie-detection technology to both governments and commercial industries. Perhaps, they say, certain complex patterns of behavioral tics could signal lying more reliably than just an elevated pulse or blood pressure. And perhaps a sophisticated algorithm could spot those patterns.

From 1969 until 1981, a serial killer nicknamed the “Yorkshire Ripper” preyed on young women in the north of England, killing at least 13 and attempting to kill at least seven others. Police interviewed and released him nine different times as his killing spree continued. His last victim was Jacqueline Hill, a 20-year-old student at the University of Leeds, who was killed in November 1980. A few months later the police finally caught him as he prepared to kill a prostitute in nearby Sheffield.

When Janet Rothwell arrived at the University of Leeds in the fall of 1980, she lived in the dormitory next door to Hill’s. She found herself haunted by Hill’s murder.

“She caught the bus from the university library about the same time as me,” said Rothwell, “and she was murdered after she got off the bus.” Rothwell later learned how long it took to catch the murderer. “I wondered,” she recalled, “could a computer flag some sort of incongruence in behavior to alert the police?”

Rothwell eventually went on to graduate school at Manchester Metropolitan University (MMU) in the late 1990s. She met Zuhair Bandar, an Iraqi-British lecturer working in the computer science department. Bandar had recently had a “eureka moment” when a marketing company had asked him to create a rudimentary device for measuring consumers’ interest in products they saw on a screen.

An FBI photo of a woman taking a polygraph.
Federal Bureau of Investigation

“They’d give the customer a handheld device,” said Bandar, “and if they approve, they press 1; if they don’t like it they press 2. I thought, why do we need handheld devices if there are already expressions on their faces?” Bandar asked Rothwell to stay at MMU after her master’s to pursue a PhD and help him design software that could analyze faces to extract information. Deceptiveness, they figured, was no less detectable than joy or anger. All would create some form of “incongruence”—behavioral patterns, whether verbal or nonverbal, that a computer might discern.

Rothwell trained a neural network in the early 2000s to track facial movements like blinking and blushing, and then fed the computer a few dozen clips of people answering the same set of questions honestly and dishonestly. To determine what the liars had in common, the computer examined an individual’s facial movements, the relationships between those movements, and the relationships between those relationships, coming up with a “theory” too complex to articulate in normal language. Once “trained” in this way, the system could use its knowledge to classify new subjects as deceptive or truthful by analyzing frame-by-frame changes in their expressions.

In a 2006 study, the system, called Silent Talker, was made to guess whether a subject was lying or telling the truth. It never achieved more than 80% accuracy while Rothwell was working on it—nor has it done substantially better in anything the research group has published since. Rothwell also told me it broke down altogether if a participant was wearing eyeglasses, and she pointed out, “You have to remember that the lighting conditions were the same and the interviews were based upon the staged theft.” But even in the early stages, Rothwell recalls, Bandar was “keen to have a commercial product”; he and a colleague once presented her with a video of a woman suspected of cheating on her husband and asked her to have Silent Talker analyze it, just like in the Gesta Romanorum.

Rothwell had her reservations. “I could see that the software, if it worked, could potentially be intrusive,” she said. “I don’t think that any system could ever be 100%, and if [the system] gets it wrong, the risk to relationships and life could be catastrophic.” She left the university in 2006; after training as an audiologist, she found a job working in a hospital on the Isle of Jersey, where she still lives.

MMU put out a press release in 2003 touting the technology as a new invention that would make the polygraph obsolete. “I was a bit shocked,” Rothwell said, “because I felt it was too early.”

The US government was making numerous forays into deception-detection technology in the first years after 9/11, with the Department of Homeland Security (DHS), Department of Defense (DoD), and National Science Foundation all spending millions of dollars on such research. These agencies funded the creation of a kiosk called AVATAR at the University of Arizona. AVATAR, which analyzed facial expressions, body language, and people’s voices to assign subjects a "credibility score,” was tested in US airports. In Israel, meanwhile, DHS helped fund a startup called WeCU ("we see you”), which sold a screening kiosk that would “trigger physiological responses among those who are concealing something,” according to a 2010 article in Fast Company. (The company has since shuttered.)

Bandar began trying to commercialize the technology. Together with two of his students, Jim O’Shea and Keeley Crockett, he incorporated Silent Talker as a company and began to seek clients, including both police departments and private corporations, for its “psychological profiling” technology. Silent Talker was one of the first AI lie detectors to hit the market. According to the company, last year technology “derived from Silent Talker” was used as part of iBorderCtrl, a European Union–funded research initiative that tested the system on volunteers at borders in Greece, Hungary, and Latvia. Bandar says the company is now in talks to sell the technology to law firms, banks, and insurance companies, bringing tests into workplace interviews and fraud screenings.

Bandar and O’Shea spent years adapting the core algorithm for use in various settings. They tried marketing it to police departments in the Manchester and Liverpool metropolitan areas. “We are talking to very senior people informally,” the company told UK publication The Engineer in 2003, noting that their aim was “to trial this in real interviews.” A 2013 white paper O’Shea published on his website suggested that Silent Talker “could be used to protect our forces on overseas deployment from Green-on-Blue (‘Insider’) attacks.” (The term “green-on-blue” is commonly used to refer to attacks Afghan soldiers in uniform make against their erstwhile allies.)

The team also published experimental results showing how Silent Talker could be used to detect comprehension as well as detection. In a 2012 study, the first to show the Silent Talker system used in the field, the team worked with a health-care NGO in Tanzania to record the facial expressions of 80 women as they took online courses on HIV treatment and condom use. The idea was to determine whether patients understood the treatment they would be getting—as the introduction to the study notes, “the assessment of participants’ comprehension during the informed consent process still remains a critical area of concern.” When the team cross-referenced the AI’s guesses about whether the women understood the lectures with their scores on brief post-lecture exams, they found it was 80% accurate in predicting who would pass and who would fail.

The algorithm trained in Manchester would, the press release said, “deliver more efficient and secure land border crossings” and “contribute to the prevention of crime and terrorism.”

The Tanzania experiment was what led to Silent Talker’s inclusion in iBorderCtrl. In 2015, Athos Antoniades, one of the organizers of the nascent consortium, emailed O’Shea, asking if the Silent Talker team wanted to join a group of companies and police forces bidding for an EU grant. In previous years, growing vehicle traffic into the EU had overwhelmed agents at the union’s border countries, and as a result the EU was offering €4.5 million ($5 million) to any institution that could “deliver more efficient and secure land border crossings ... and so contribute to the prevention of crime and terrorism.” Antoniades thought Silent Talker could play a crucial part.

When the project finally announced a public pilot in October 2018, the European Commission was quick to tout the “success story” of the system’s “unique approach” to deception detection in a press release, explaining that the technology “analyses the micro-gestures of travelers to figure out if the interviewee is lying.” The algorithm trained in Manchester would, the press release continued, “deliver more efficient and secure land border crossings” and “contribute to the prevention of crime and terrorism.”

The program’s underlying algorithm, O’Shea told me, could be used in a variety of other settings—advertising, insurance claim analysis, job applicant screening, and employee assessment. His overwhelming belief in its wisdom was hard for me to share, but even as he and I spoke over the phone, Silent Talker was already screening volunteers at EU border crossings; the company had recently launched as a business in January 2019. So I decided to go to Manchester to see for myself.

Silent Talker’s offices sit about a mile away from Manchester Metropolitan University, where O’Shea is now a senior lecturer. He has taken over the day-to-day development of the technology from Bandar. The company is based out of a blink-and-you’ll-miss-it brick office park in a residential neighborhood, down the street from a kebab restaurant and across from a soccer pitch. Inside, Silent Talker’s office is a single room with a few computers, desks with briefcases on them, and explanatory posters about the technology from the early 2000s.

When I visited the company’s office in September, I sat down with O’Shea and Bandar in a conference room down the hall. O’Shea was stern but slightly rumpled, bald except for a few tufts of hair and a Van Dyke beard. He started the conversation by insisting that we not talk about the iBorderCtrl project, later calling its critics “misinformed.” He spoke about the power of the system’s AI framework in long, digressive tangents, occasionally quoting the computing pioneer Alan Turing or the philosopher of language John Searle.

“Machines and humans both have intentionality—beliefs, desires, and intentions about objects and states of affairs in the world,” he said, defending the system’s reliance on an algorithm. “Therefore, complicated applications require you to give mutual weight to the ideas and intentions of both.”

O’Shea demonstrated the system by having it analyze a video of a man answering questions about whether he stole $50 from a box. The program superimposed a yellow square around the man’s face and two smaller squares around his eyes. As he spoke, a needle in the corner of the screen moved from green to red when he gave false answers, and back to a moderate orange when he wasn’t speaking. When the interview was over, the software generated a graph plotting the probability of deception against time. In theory, this showed when he started and stopped lying.

As he spoke, a needle in the corner of the screen moved from green to red when he gave false answers, and back to a moderate orange when he wasn’t speaking.

The system can run on a traditional laptop, O’Shea says, and users pay around $10 per minute of video analyzed. O’Shea told me that the software does some preliminary local processing of the video, sends encrypted data to a server where it is further analyzed, and then sends the results back: the user sees a graph of the probability of deception overlaid across the bottom of the video.

According to O’Shea, the system monitors around 40 physical “channels” on a participant’s body—everything from the speed at which one blinks to the angle of one’s head. It brings to each new face a “theory” about deception that it has developed by viewing a training data set of liars and truth tellers. Measuring a subject’s facial movements and posture changes many times per second, the system looks for movement patterns that match those shared by the liars in the training data. These patterns aren’t as simple as eyes flicking toward the ceiling or a head tilting toward the left. They’re more like patterns of patterns, multifaceted relationships between different motions, too complex for a human to track—a typical trait of machine-learning systems.

The AI’s job is to determine what kinds of patterns of movements can be associated with deception. “Psychologists often say you should have some sort of model for how a system is working,” O’Shea told me, “but we don’t have a functioning model, and we don’t need one. We let the AI figure it out.” However, he also says the justification for the “channels” on the face comes from academic literature on the psychology of deception. In a 2018 paper on Silent Talker, its creators say their software “assumes that certain mental states associated with deceptive behavior will drive an interviewee’s [non-verbal behavior] when deceiving.” Among these behaviors are “cognitive load,” or the extra mental energy it supposedly takes to lie, and “duping delight,” or the pleasure an individual supposedly gets from telling a successful lie.

photograph of Paul Ekman — Paul Ekman, a psychologist whose theory of "micro-expressions" is much disputed, has consulted for myriad US government agencies.
Wikimedia / Momopuppycat

But Ewout Meijer, a professor of psychology at Maastricht University in the Netherlands, says that the grounds for believing such behaviors are universal are unstable at best. The idea that one can find telltale behavioral “leakages” in the face has roots in the work of Paul Ekman, an American psychologist who in the 1980s espoused a now-famous theory of “micro-expressions,” or involuntary facial movements too small to control. Ekman’s research made him a best-selling author and inspired the TV crime drama Lie to Me. He consulted for myriad US government agencies, including DHS and DARPA. Citing national security, he has kept research data secret. This has led to contentious debate about whether micro-expressions even carry any meaning.

Silent Talker’s AI tracks all kinds of facial movement, not Ekman-specific micro-expressions. “We decomposed these high level cues into our own set of micro gestures and trained AI components to recombine them into meaningful indicative patterns,” a company spokesperson wrote in an email. O’Shea says this enables the system to spot deceptive behavior even when a subject is just looking around or shifting in a chair.

“A lot depends on whether you have a technological question or a psychological question,” Meijer says, cautioning that O’Shea and his team may be looking to technology for answers to psychological questions about the nature of deception. “An AI system may outperform people in detecting [facial expressions], but even if that were the case, that still doesn’t tell you whether you can infer from them if somebody is deceptive … deception is a psychological construct.” Not only is there no consensus about which expressions correlate with deception, Meijer adds; there is not even a consensus about whether they do. In an email, the company said that such critiques are “not relevant” to Silent Talker and that “the statistics used are not appropriate.”

lie to me tv poster — The television drama *Lie to Me* was based in part on Ekman's micro-expression theory.
Fox studios

Furthermore, Meijer points out that the algorithm will still be useless at border crossings or in job interviews unless it’s been trained on a data set as diverse as the one it will be evaluating in real life. Research shows that facial recognition algorithms are worse at recognizing minorities when they have been trained on sets of predominantly white faces, something O’Shea himself admits. A Silent Talker spokesperson wrote in an email, “We conducted multiple experiments with smaller varying sample sizes. These add up to hundreds. Some of these are academic and have been publish [sic], some are commercial and are confidential.”

However, all the published research substantiating Silent Talker’s accuracy comes from small and partial data sets: in the 2018 paper, for instance, a training population of 32 people contained twice as many men as women and only 10 participants of “Asian/Arabic” descent, with no black or Hispanic subjects. While the software presently has different “settings” for analyzing men and women, O’Shea said he wasn’t certain whether it needed settings for ethnic background or age.

After the pilot of iBorderCtrl was announced in 2018, activists and politicians decried the program as an unprecedented, Orwellian expansion of the surveillance state. Sophie in ’t Veld, a Dutch member of the European Parliament and leader of the center-left Democrats 66 party, said in a letter to the European Commission that the Silent Talker system could violate “the fundamental rights of many border-crossing travelers” and that organizations like Privacy International condemned it as “part of a broader trend towards using opaque, and often deficient, automated systems to judge, assess, and classify people.” The opposition seemed to catch the iBorderCtrl consortium by surprise: though initially the European Commission claimed that iBorderCtrl would “develop a system to speed up border crossings,” a spokesperson now says the program was a purely theoretical “research project.” Antoniades told a Dutch newspaper in late 2018 that the deception-detection system “may ultimately not make it into the design,” but, as of this writing, Silent Talker was still touting its participation in iBorderCtrl on its website.

However often critics like Wilde debunk it, the dream of a perfect lie detector just won’t die, especially when glossed over with the sheen of AI.

Silent Talker is “a new version of the old fraud,” opines Vera Wilde, an American academic and privacy activist who lives in Berlin, and who helped start a campaign against iBorderCtrl. “In some ways, it’s the same fraud, but with worse science.” In a polygraph test, an examiner looks for physiological events thought to be correlated with deception; in an AI system, examiners let the computer figure out the correlations for itself. “When O’Shea says he doesn’t have a theory, he’s wrong,” she continues. “He does have a theory. It’s just a bad theory.”

However often critics like Wilde debunk it, the dream of a perfect lie detector just won’t die, especially when glossed over with the sheen of AI. After DHS spent millions of dollars funding deception research at universities in the 2000s, it tried to create its own version of a behavior-analysis technology. This system, called Future Attribute Screening Technology (FAST), aimed to use AI to look for criminal tendencies in a subject’s eye and body movements. (An early version required interviewees to stand on a Wii Fit balance board to measure changes in posture.) Three researchers who spoke off the record to discuss classified projects said that the program never got off the ground—there was too much disagreement within the department over whether to use Ekman’s micro-expressions as a guideline for behavior analysis. The department wound down the program in 2011.

Despite the failure of FAST, DHS still shows interest in lie detection techniques. Last year, for instance, it awarded a $110,000 contract to a human resources company to train its officers in “detecting deception and eliciting response” through “behavioral analysis.” Other parts of the government, meanwhile, are still throwing their weight behind AI solutions. The Army Research Laboratory (ARL) currently has a contract with Rutgers University to create an AI program for detecting lies in the parlor game Mafia, as part of a larger attempt to create “something like a Google Glass that warns us of a couple of pickpockets in the crowded bazaar,” according to Purush Iyer, the ARL division chief in charge of the project. Nemesysco, an Israeli company that sells AI voice-analysis software, told me that its technology is used by police departments in New York and sheriffs in the Midwest to interview suspects, as well as by debt collection call centers to measure the emotions of debtors on phone calls.

The immediate and potentially dangerous future of AI lie detection is not with governments but in the private market. Politicians who support initiatives like iBorderCtrl ultimately have to answer to voters, and most AI lie detectors could be barred from court under the same legal precedent that governs the polygraph. Private corporations, however, face fewer constraints in using such technology to screen job applicants and potential clients. Silent Talker is one of several companies that claim to offer a more objective way to detect anomalous or deceptive behavior, giving clients a “risk analysis” method that goes beyond credit scores and social-media profiles.

The software generates a high number of false positives.

A Montana-based company called Neuro-ID conducts AI analysis of mouse movements and keystrokes to help banks and insurance companies assess fraud risk, assigning loan applicants a “confidence score” of 1 to 100. In a video the company showed me, when a customer making an online loan application takes extra time to fill out the field for household income, moving the mouse around while doing so, the system factors that into its credibility score. It’s based on research by the company’s founding scientists that claims to show a correlation between mouse movements and emotional arousal: one paper that asserts that “being deceptive may increase the normalized distance of movement, decrease the speed of movement, increase the response time, and result in more left clicks.” The company’s own tests, though, reveal that the software generates a high number of false positives: in one case study where Neuro-ID processed 20,000 applications for an e-commerce website, fewer than half the applicants who got the lowest scores (5 to 10) turned out to be fraudulent, and only 10% of the those who received scores from 20 to 30 represented a fraud risk. By the company’s own admission, the software flags applicants who may turn out to be innocent and lets the company use that information to follow up how it pleases. “There’s no such thing as behavior-based analysis that’s 100% accurate,” a spokesperson told me. “What we recommend is that you use this in combination with other information about applicants to make better decisions and catch [fraudulent clients] more efficiently.”

Converus, a Utah-based startup, sells software called EyeDetect that measures the dilation of a subject’s pupils during an interview to detect cognitive load. Like Silent Talker, the tool starts from the premise that lying is more cognitively demanding than telling the truth. According to a 2018 article in Wired, police departments in Salt Lake City and Columbus, Georgia, have used EyeDetect to screen job applicants. Converus also told Wired that McDonald’s, Best Western, Sheraton, IHOP, and FedEx used its software in Panama and Guatemala in ways that would have been illegal in the US.

In a statement it gave me, the company cited a few studies that show EyeDetect achieving around 85% accuracy in identifying liars and truth tellers, with samples of up to 150 people. Company president Todd Mickelsen says that his firm’s algorithm has been trained on hundreds of thousands of interviews. But Charles Honts, a professor of psychology at Boise State University who also serves on Converus’s advisory board, said these results didn’t prove that EyeDetect could be relied upon in field interviews. “I find the EyeDetect system to be really interesting, but on the other hand, I don’t use it,” he told me. “I think the database is still relatively small, and it comes mostly from one laboratory. Until it’s expanded and other people have replicated it, I’d be reluctant to use it in the field.”

The University of Arizona researchers who developed the AVATAR system have also started a private company, Discern Science, to market their own deception-detection technology. Launched last year, Discern sells a six-foot-tall kiosk similar to the original AVATAR; according to an article in the Financial Times, the company has “entered into a joint venture agreement with a partner in the aviation industry” to sell the tool to airports. The system measures facial movement and voice stress to “invisibly gather information from the subject at a conversational distance,” according to promotional materials. Like Silent Talker and Converus, Discern claims the technology can reliably detect around 85% of liars and truth tellers, but again, its results have never been independently replicated. At least one of the inputs the kiosk uses has been repeatedly shown to be untrustworthy. (Honts further noted that “there’s almost no support” for facial movement analysis like AVATAR’s and Silent Talker’s—“there have been so many failures to replicate there,” he said.)

“Anyone who tells you they have a device that is a straight-out lie detector is a charlatan.”

When questioned about the scientific backing for the company’s kiosk, Discern researcher Judee Burgoon emphasized that they merely make assessments, not binding judgments about truth and falsehood. Systems like AVATAR and Silent Talker, she said, “cannot directly measure deceit,” adding that “anyone who tells you they have a device that is a straight-out lie detector is a charlatan.” In marketing materials, though, Discern does present the tool as a reliable lie detector: the company’s website claims that it “can aid in uncovering hidden plans” and that its algorithms “have been scientifically proven to detect deception faster and more reliably than any alternative.”

The appeals court vacated Emmanuel Mervilus’s conviction in 2011, releasing him from prison and ordering a retrial; he had served more than three years of his sentence. At the second trial, in 2013, the jurors deliberated for only 40 minutes before acquitting him. Were it not for the polygraph, and the persistent belief in its accuracy, he might never have set foot in a courtroom the first time. Mervilus has sued the police officers who originally arrested and interrogated him, alleging that they violated his right to due process by using polygraph tests they knew were faulty in order to secure a conviction. The case will proceed to a settlement conference on March 13.

Even if the widespread use of Silent Talker and systems like it doesn’t lead to more convictions of innocent people like Mervilus, it could still help create a new kind of social shibboleth, forcing people to undergo “credibility assessments” before renting a car or taking out a loan.

“In a court, you have to give over material evidence, like your hair and your blood,” says Wilde. “But you also have a right to remain silent, a right not to speak against yourself.” Mervilus opted to take the polygraph test on the assumption that, like a DNA test, it would show he was innocent. And although the device got it wrong, it wasn’t the machine itself that sent him to prison. It was the jury’s belief that the test results were more credible than the facts of the case.

The foundational premise of AI lie detection is that lies are there to be seen with the right tools. Psychologists still don’t know how valid that claim is, but in the meantime, the belief in its validity may be enough to disqualify deserving applicants for jobs and loans, and to prevent innocent people from crossing national borders. The promise of a window into the inner lives of others is too tempting to pass up, even if nobody can be sure how clear that window is.

“It’s the promise of mind-reading,” says Wilde. “You can see that it’s bogus, but that’s what they’re selling.”