Voice Analysis Tech Could Diagnose Disease

Researchers enlist smartphones and machine learning to find vocal patterns that might signal post-traumatic stress disorder or even heart disease.

Emily Mullinarchive page

January 19, 2017

In the near future, smartphone apps and wearables could help diagnose disease with short voice samples.

Charles Marmar has been a psychiatrist for 40 years, but when a combat veteran steps into his office for an evaluation, he still can’t diagnose post-traumatic stress disorder with 100 percent accuracy.

“You would think that if a war fighter came into my office I’d be able to decide if they have PTSD or not. But what if they’re ashamed to tell me about their problems or they don’t want to lose their high-security clearance, or I ask them about their disturbing dreams and they say they’re sleeping well?” says Marmar.

Marmar, who is chairman of the department of psychiatry at New York University's Langone Medical Center, is hoping to find answers in their speech.

Voice samples are a rich source of information about a person’s health, and researchers think subtle vocal cues may indicate underlying medical conditions or gauge disease risk. In a few years it may be possible to monitor a person’s health remotely—using smartphones and other wearables—by recording short speech samples and analyzing them for disease biomarkers.

For psychiatric disorders like PTSD, there are no blood tests, and people are often embarrassed to talk about their mental health, so these conditions frequently go underdiagnosed. That’s where vocal tests could be useful.

As part of a five-year study, Marmar is collecting voice samples from veterans and analyzing vocal cues like tone, pitch, rhythm, rate, and volume for signs of invisible injuries like PTSD, traumatic brain injury (TBI), and depression. Using machine learning to mine features in the voice, algorithms pick out vocal patterns in people with these conditions and compare them with voice samples from healthy people.

For example, people with mental or cognitive problems may elongate certain sounds, or struggle with pronouncing phrases that require complex facial muscle movements.

Collaborating with researchers at SRI International, a nonprofit research institute in northern California, Marmar has been able to pick out a set of 30 vocal characteristics that seem to be associated with PTSD and TBI from 40,000 total features they've extracted from the voices of veterans and control subjects.

In early results presented in 2015, a voice test developed by Marmar and his team was 77 percent accurate at distinguishing between PTSD patients and healthy volunteers in a study of 39 men. More voice recordings have been collected since that study, and Marmar and his colleagues are close to identifying speech patterns that can distinguish between PTSD and TBI.

“Medical and psychiatric diagnosis will be more accurate when we have access to large amounts of biological and psychological data, including speech features,” Marmar says. To date, the U.S. Food and Drug Administration has not approved any speech tests to diagnose disease.

Beyond mental health, the Mayo Clinic is pursuing vocal biomarkers to improve remote health monitoring for heart disease. It’s teaming up with Israeli company Beyond Verbal to test the voices of patients with coronary artery disease, the most common type of heart disease. They reason that chest pain caused by hardening of the arteries may affect voice production.

In an initial study, the Mayo Clinic enrolled 150 patients and asked them to produce three short voice recordings using an app developed by Beyond Verbal. Researchers analyzed the voices using machine learning and identified 13 different vocal features associated with patients at risk of coronary artery disease.

One characteristic, related to the frequency of the voice, was associated with a 19-fold increase in the likelihood of coronary artery disease. Amir Lerman, a cardiologist and professor of medicine at the Mayo Clinic, says this vocal trait isn’t discernable to the human ear and can only be picked up using the app’s software.

“What we found out is that specific segments of the voice can be predictive of the amount or degree of the blockages found by the angiography,” Lerman says.

Lerman says a vocal test app on a smartphone could be used as a low-cost, predictive screening tool to identify patients most at risk of heart disease, as well as to remotely monitor patients after cardiac surgery. For example, changes in the voice could indicate whether patients have stopped taking their medication.

Next Mayo plans to conduct a similar study in China to determine if the voice biomarkers identified in the initial study are the same in a different language.

Jim Harper, CEO of Sonde Health in Boston, sees value in using voice tests to monitor new mothers for postpartum depression, which is widely believed to be underdiagnosed, and older people with dementia, Parkinson’s, and other diseases of aging. His company is working with hospitals and insurance companies to set up pilot studies of its AI platform, which detects acoustic changes in the voice to screen for mental health conditions.

“We’re trying to make this ubiquitous and universal by engineering a technology that allows our software to operate on mobile phones and a range of other voice-enabled devices,” Harper says.

One major problem researchers are working on is whether these different vocal characteristics can be faked by patients. If so, the tests might not be very reliable.

The technology also raises privacy and security concerns. Not all patients will want to give voice samples that contain personal information or let apps have access to their phone calls. Researchers insist that their algorithms are capturing patterns in the voice, not logging what you say.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.