Skip to Content

How Shining a Laser on Your Face Might Help Siri Understand You

Startup VocalZoom is building a sensor that measures the vibrations of your face to make it easier for you to control technology with your voice.

From Siri to Alexa to Cortana, we’re talking to virtual assistants more than ever before. They can still have trouble understanding simple commands to play music or look up directions, though, especially in noisy places.

Rather than focusing on cleaning up the audio signal that captures your voice, Israeli startup VocalZoom thinks it might be possible to make all kinds of speech-recognition applications work a lot better by using a tiny, low-power laser that measures the itty-bitty vibrations of your skin when you speak.

The company, which has raised about $12.5 million in venture funding thus far, is building a sensor with a small laser that it says will initially be built into headsets and helmets; there, it will be used alongside existing speech-recognition technologies that rely on microphones in order to reduce overall misunderstandings.

VocalZoom founder and CEO Tal Bakish thinks it will first be used for things like motorcycle helmets or headsets worn by warehouse workers—you might use it to ask for directions while riding your Harley, for instance. A Chinese speech-recognition company called iFlytek plans to have a prototype headset ready at the end of August. Bakish also expects it to be added to cars by 2018 for giving voice commands when you’re behind the wheel. The company has joint-development agreements with several automotive companies, though he won’t name them on the record, and he’s interested in getting the technology into smartphones, too.

At a noisy coffee shop in Boston, Bakish shows me a nonworking version of VocalZoom’s first product, slated to be ready this summer: a tiny sensor with a laser that shines directly at your face (he says it’s eye-safe according to U.S. Food and Drug Administration rules). If you were using one of these sensors in a headset to ask for directions to a restaurant, for instance, it would measure the velocity of your facial skin vibrations, while a regular audio signal would be captured by a microphone; software would then compare these two signals to come up with the best approximation of what you’re trying to say.

Bakish says VocalZoom’s sensor can measure vibrations of the skin from your eyes down to your throat and neck, and that it’s also possible to do so from behind, such as by analyzing vibrations behind your ears. The laser can work up to a meter away, though a five-centimeter distance is sufficient in, say, a headset.

Bakish says that when used alongside more standard audio-analyzing speech-recognition technology, VocalZoom has been able to cut speech-recognition error rates by 60 to 80 percent.

Abe Davis, a graduate student at MIT’s Computer Science and Artificial Intelligence Laboratory whose work has focused on gleaning audio from video by analyzing the tiny vibrations that various objects make, thinks it would be difficult to get VocalZoom to work in a car, where he suspects it could be hampered by things like your head moving around.

In a headset or helmet, however, he could see it being useful.

“It’s just a question of whether you can make sure the laser’s pointed at the right thing,” he says.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.