Look, Smartphone: No Hands!

Controlling an iPhone or Android phone with just your voice and a noise-cancelling headset is doable, but frustrating.

Rachel Metzarchive page

March 17, 2014

I usually enjoy making fun of people who walk around wearing Bluetooth headsets, seemingly talking to themselves. So of course I felt like a hypocrite last week wandering around downtown San Francisco doing exactly that.

**So controlling:** There are plenty of things you can do on your smartphone via voice control. Texting isn’t always that easy, though.

I had an excuse, though. The rise of wearable gadgets means touch displays are getting ever smaller, and in some cases they may not be the best way to interact with these new devices. Voice-activated assistants like Siri and Google Now, meanwhile, are becoming increasingly popular. So I wanted to see how easy it would be to control both an iPhone and an Android smartphone with my voice, without having to touch them (spoiler: not very, but voice control does show promise).

For the experiment, I used Jawbone’s new Era Bluetooth headset ($100), which has noise-cancelling technology meant to help it pick up your voice even in loud places, and which can control both Siri and Google Now without your taking your phone out of your pocket (though you’ll still have to press a button on the headset itself). It also supports wideband audio (aka HD voice), which is emerging on some handsets and networks and can make speech recognition easier.

The Era is extremely compact—a bit less than two inches long, about half an inch tall—and weighs just six grams. Like Jawbone’s other products, it has a sleek, high-fashion look: it’s a faceted bar with a power switch hidden next to the earpiece and a single button on its rear end. The matte black one I tested nestled close to my cheek, easily hidden if I wore my hair down (which I did when using it in public, since even though it’s good-looking relative to other Bluetooth headsets, I didn’t want to show it off).

As with other wearable technology, power is a major concern. You wouldn’t be able to get through an entire day of ordering your smartphone around via the Era, as you get just four hours of talk time on a charge. But realistically, you probably wouldn’t be using it nonstop, and buying it with its optional charging case ($130 for the pair) will give you about six more hours of juice if you plug it into the case when not using it.

Jawbone has created an app for iPhones and Android devices that lets you customize some of the Era’s functions. I set it to know that one long press on the button meant I wanted to use either Google Now or Siri, depending on which smartphone it was connected to at the time.

First I tried out the Era with Siri on my own iPhone 5S. In the middle of the day I headed to San Francisco’s Union Square—a bustling shopping district—and started talking to my phone, which was hidden in my back pocket. The Era was able to pick up my voice so Siri could accurately respond to my commands on crowded streets and in busy stores. I had it read my work e-mail aloud and composed a response for it to send; I had it post undoubtedly clever tweets and define words for me while I walked through a busy shopping mall and its surrounding neighborhood.

It was easier to interact with my iPhone this way than by holding down the button on its face to summon Siri, and I was impressed by how much I could get done without even looking at the phone’s screen, which my eyes are normally glued to. Although I felt weird talking to my phone in public, I could imagine using the Era to interact with it and other gadgets at home, especially in the kitchen when my hands might not be free.

Siri still had a hard time understanding some things, especially when I tried to play music by musicians like Ferraby Lionheart and CeeLo Green, or used words with “ee” sounds. In one particularly vexing exchange about an upcoming party, I learned that Siri really doesn’t like the word “theme,” at least not the way I pronounce it. Instead, I got “FEMA,” “Tina,” and “fee” (twice).

Despite some difficulties recognizing artists’ names, the Era was best for simply playing music, as it has excellent sound quality and I could use Siri to skip tracks and pick artists (when I was multitasking, having tunes in one ear was fine). Adjusting sound or switching tracks was kind of a pain, though: to turn sound up or down via the Era, you have to hold down its one button and let the volume cycle all the way down and then all the way up, releasing your finger when it gets to the right level.

**All ears:** Jawbone’s new Era Bluetooth headset can be used with an iPhone or Android smartphone to control Siri or Google Now.

Then it was on to testing the Era with an Android smartphone. I quickly realized that Google Now’s ability to understand what I was saying was superior to Siri’s, but it still had some problems doing things like creating and sending messages if I didn’t enunciate as clearly as possible.

The Era also had trouble launching Google Now if the phone was asleep. Holding the button would bring me into the phone’s voice dialer; I had to do a short button press to cancel that before another long press brought up the general voice search that let me do things like check my appointments and get directions.

Whether on the iPhone or Android, the Era did a good job of cancelling noise when I was walking or standing still in crowds, but when I was cycling the wind generally drowned out my attempts to tell it to do things like play music or place a phone call. This is important. If your voice is the only way to control a gadget like a smart watch or head-worn computer, you will need the microphone to be robust enough to counteract wind so you can use it for all kinds of outdoor activities.

Oddly, both handsets had a really hard time understanding when I tried to dial sources and editors who weren’t in my address book, resulting in a lot of frustration. And when I finally got through, while I could hear the person on the other end just fine, several of them complained that I was hard to understand or cutting in and out. In two of those cases, I was alone in my office or at home, so it wasn’t an issue of background noise.

To get the perspective of someone who’s been in the speech-recognition trenches long enough to know how far the technology has come (and how far it still has to go), I called Jim Glass, who heads MIT’s Spoken Language Systems Group and studies automatic speech recognition and spoken-language understanding. (I actually called him first using the Era and the Android handset, but he said he couldn’t hear me well, so I hung up and called him back from my land line.)

Glass does think that as gadgets get tinier, voice will become an increasingly natural way to interact with them. Still, he says that while speech recognition will improve, there will continue to be people for whom it won’t work well, such as nonnative speakers of the language that’s being scrutinized. For this and other reasons, he thinks it’s best if wearable gadgets offer multiple ways to interact. People might not mind chattering alone in the car, but not everyone is comfortable doing so on the bus.

“I think giving people choice is always the best option when you can,” he says.

I agree, in part because I couldn’t shake the feeling that I looked completely bizarre muttering to myself while using the Era in public. I suspect erasing that feeling will prove to be even more difficult than improving the speech recognition.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.