A Tongue-Tracking Artificial Larynx

Patients could get their voice back using a device that analyzes contact between the tongue and palate.

Rachel Kremenarchive page

December 3, 2009

Researchers in South Africa are working on a new kind of artificial larynx that won’t have the raspy voice of existing devices. The system tracks contact between the tongue and palate to determine which word is being mouthed, and uses a speech synthesizer to generate sounds.

**Making contact:** The palatometer in the top image is normally used in speech therapy. Researchers in South Africa are training the device to recognize words mouthed by people who have had their larynx removed. The space-time graph in the bottom image corresponds to the tongue-palate contact pattern for the word “been.”

According to the National Cancer Institute, some 10,000 Americans are diagnosed with laryngeal cancer each year, and most patients with advanced cancer must have their voice box removed.

“All of the currently available devices produce such bad sound–it either sounds robotic or has a gruff speaking voice,” says Megan Russell, a PhD candidate at the University of the Witwatersrand in Johannesburg, South Africa. “We felt the tech was there for an artificial synthesized voice solution.”

The system uses a palatometer: a device that looks much like an orthodontic plate and is normally used for speech therapy. The device, made by CompleteSpeech of Orem, UT, tracks contact between the tongue and palate using 118 embedded touch sensors. The software for the artificial larynx was written by Russell and colleagues at the University of the Witwatersrand. Their work is being presented at the International Conference on Biomedical and Pharmaceutical Engineering this week in Singapore.

To use the device, a person puts the palatometer in her mouth and mouths words normally. The system tries to translate those mouth movements into words before reproducing them on a small sound synthesizer, perhaps tucked into a shirt pocket.

So far, Russell has trained the system to recognize 50 common English words by saying each word multiple times with the palatometer in her mouth. The information can be represented on a binary space-time graph and put into a database. Each time the user speaks, the contact patterns are compared against the database to identify the correct word.

Russell’s team has tested the word-identification system using a variety of techniques. One approach involves aligning and averaging the data produced while training the device for a few instances of a word to create a template for comparison. Another compares features such as the area of the data plots on the graph, and the center of mass on the X and Y axes. A voting system compares the results of selected methods to see whether there is agreement. The researchers have also tested a predictive-analysis system, which considers the last word mouthed to help determine the next.

Russell says that when the voting and predictive elements are combined, the system identifies the correct word 94.14 percent of the time, although this doesn’t include words that the system classifies as “unknown” and chooses to skip. Russell says that happens about 18 percent of the time. But choosing the wrong word “could lead to some very difficult social situations,” Russell says, so it’s best for the system to reject unclear words and remain silent.

The team hopes to eliminate the palatometer’s ribbon cables, which run out of the user’s mouth, and instead create a system in which data is transmitted wirelessly from the palatometer to a speech synthesizer. The group also needs to improve upon the predictive analysis system and expand the database of words.

Russell’s team will also need to test many more subjects, including those without a larynx, before the device could become widely available.

“We also intend to implement a degree of user-controlled modulation of pitch and volume in order to achieve a more natural sound,” says David Rubin, an adjunct professor of electrical and information engineering at the University of the Witwatersrand and one of Russell’s advisors on the project. “For example, we intend the user to be able to achieve the typical upward inflection in the voice when asking a question as compared to making a statement.”

It is also important for the team to improve on the processing speed of the device. It now takes more than a second for the system to identify and play back the correct word. “Ultimately this time needs to be reduced to below 0.3 seconds in order for it to appear to observers that the person’s lips and voice are synchronized,” Russell says.

Researchers at the University of Hull in the U.K. are exploring a different approach to the problem. They place magnets in the mouth and these to measure changes in the magnetic field around the mouth that correspond to movement. Currently, the Hull researchers use surgical glue to affix six magnets to the lips, throat, and tongue. Ultimately, the magnets would be implanted. “We are still working out where is the best place to put them,” says James Gilbert, a senior lecturer in engineering at Hull.

Like the Witwatersrand team, the Hull group hopes to eliminate much of the wiring and expand its dataset. Currently, the Hull system can only identify 10 words, and the accuracy can range from anywhere between 70 and 100 percent.

However, Gilbert questions whether all words could ever be identified using a palatometer. “The analysis methods seem reasonable,” he says, but adds that “for some words, there’s very limited contact with the palate.”

The Witwatersrand team believes its design is superior because it doesn’t require surgical implants. “In the event that the overall idea proves to be feasible, we hope to be able to enter into agreements with the companies producing existing systems to enable us to move forward with this approach to an artificial larynx,” Rubin says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.