Apple’s New iPhone Would Like to Talk

An AI personal assistant called Siri is the biggest new feature of the iPhone 4S.

Tom Simonitearchive page

October 4, 2011

In 2007, Apple founder Steve Jobs told the world that his company’s phones would be controlled with “the best pointing device in the world … our fingers.”

Today, his company announced that users of the next iPhone, the 4S, will be able to use their voices to control it, too.

Holding down the “home” button on the new iPhone 4S, available in the U.S. starting on October 14, summons a “personal assistant” known as Siri that can understand commands given in English, French, or German. It responds in a conversational style in both text and synthesized speech.

Today’s event was the first Apple launch presided over by Tim Cook, who recently became CEO after the company’s founder, Steve Jobs, stepped down for medical reasons. Cook opened the presentation in the now-familiar style established by Jobs, teasing journalists in attendance by spending time boasting of sales figures and new retail store openings.

Cook wasn’t on stage for the biggest news of the day, though, leaving it to Phil Schiller, Apple’s senior vice president for worldwide product marketing, to introduce Siri. Cook returned to the stage to round up the day’s news, though, declaring himself “so incredibly proud of this company.” He didn’t make any reference to Steve Jobs.

Demonstrations on stage at the launch event at Apple’s headquarters in Cupertino, California, this morning saw Siri handling questions including, “What is the weather like today?” to which it responded by displaying and speaking a forecast for the owner’s current location. When the question was posed more conversationally, as “Do I need a raincoat today?” Siri responded in a similar manner: “It sure looks like rain today.”

Siri draws on a number of online sources of information, from weather feeds to local business reviews, as well as the question-and-answer search engine Wolfram Alpha. Siri is able to find restaurants in response to a query such as “Find me a great Greek restaurant in Palo Alto,” and even offers to book a table. Wolfram Alpha allows Siri to handle requests like “How many dollars is 45 euros?” The personal assistant is also able to take charge of a person’s phone and do things like set alarms or meeting reminders in response to commands like “wake me up at 6 a.m. tomorrow.”

The technology behind Siri originated in a DARPA-funded research project conducted at private research lab SRI International, and was used to launch a startup company and the Siri iPhone app in 2009. Siri was one of Technology Review’s 10 technologies to watch in 2009, and Apple bought the company in 2010.

A separate new feature, Dictation, allows users of the new iPhone to use speech recognition to compose text messages or e-mails; this is already possible for users of Google’s Android software for phones and tablets.

Uncharacteristically for Apple, which prides itself on obsessively polished devices and software that “just work,” both Siri and the Dictation feature were labeled as “beta” products. One potential reason is that like all voice-recognition technology, that which Apple has licensed from the software company Nuance for both Siri and Dictation may not always be perfectly accurate.

Apart from Siri and Dictation, the new iPhone 4S is outwardly identical to the previous model, the iPhone 4, but has improved components inside, including a more powerful processor and an improved camera.

Norman Winarsky of SRI International worked on the project that spawned Siri and was cofounder of the company that launched its original app. He says the version Apple unveiled is now more powerful. “It’s not just connected to various Web services, but also to your calendar and contacts and music and everything on the phone,” he says. Putting Siri at the heart of a powerful and popular device represents “a paradigm shift in how people can interact with their device and other services,” he adds.

Winarsky says Siri’s speech-based interface is not its most impressive feature. “Recognizing speech has become a commodity. It is finding the intent in what you said and matching that with the Web services available that cost hundreds of millions in research.” Winarsky and colleagues at SRI made their technology capable of handling ambiguity and variability in statements, enabling Siri to deal with casual commands so that users don’t have to use carefully scripted phrasing, he says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.