Smart Assistant Listens to You Talk, Fetches Info Automatically

Expect Labs hopes to reinvent the phone call by providing real-time search results.

Jessica Leberarchive page

January 17, 2013

Apple’s voice-activated assistant Siri works by pulling up information that a person asks for. A startup called Expect Labs is skipping the asking step.

The 10-person company is developing what it calls an “always-on Siri”—a technology that listens to a phone call between two or more people, interprets the conversation as it happens, and brings up what it thinks is useful information.

In the next few weeks, the San Francisco-based company plans to launch its first product, MindMeld, an iPad app for making video and voice calls. It also intends to license its “anticipatory computing” engine to businesses this year, and this could give speech apps on tablets, phones, car dashboards, and elsewhere new capabilities. At a large workplace, for example, a business could build software that pulls up old meeting notes during conference calls by accessing document servers and calendars. A call center company could use it to bring up purchase histories as representatives talk to customers.

“This is contextual, continuous, predictive search that occurs alongside a real-time conversation,” says CEO Timothy Tuttle, an entrepreneur and computer scientist who launched Expect Labs in 2011. Backed by investors including Google Ventures and Greylock Partners, MindMeld will be the first product on the market to have these combined abilities, Tuttle says.

The startup demonstrated MindMeld at the Consumer Electronics Show in Las Vegas this month. Users can sign up or log in through Facebook and hold free video or voice calls with up to eight people through the app. If a participant taps on a button in the app during a call, MindMeld will review the previous 15 to 30 seconds of conversation by relying on Nuance’s voice recognition technology. It will identify key terms in context—in a discussion to find a sushi restaurant, for example, or one about a big news story that day—and then search Google News, Facebook, Yelp, YouTube, and a few other sources for relevant results. Images and links are displayed in a stream for the person who tapped the button to review. With a finger swipe, he or she can choose to share a result with others on the call.

Tuttle views MindMeld mostly as a way to test the technology before it’s made more widely available to others—there’s no plan to show ads, and there’s only a small download fee. The app requires users to press a button to trigger its listening features so they won’t be overwhelmed with a stream of search results when they don’t need them, Tuttle says. However, the technology platform the company plans to license can listen in continuously to conversations of any length as it runs predictive models to find search results that are relevant to the discussion. Its information sources can be any to which a software developer has access.

The technology is similar to some of Google’s latest products. Like MindMeld, Google Now, a feature of Google’s Android operating system, works to find relevant information on mobile devices without a person asking (see “Google’s Answer to Siri Thinks Ahead”). Google Now makes predictions based on a person’s location, e-mail and Web search history. As the search giant readies to launch Google Glass, its goggle-like wearable computer, such hands-free interaction modes that can simply run in the background will become even more of a necessity.

“They’ve really hit a nice niche,” Anind Dey, a Carnegie Mellon University human-computer interaction researcher, says of Expect Lab’s technology. “That it doesn’t require explicit interaction is really quite exciting.”

The software is limited by the abilities of voice recognition technology, which can be hit or miss. But Tuttle says MindMeld can tolerate some inaccuracies and still call up relevant information.

Some may find using a calling app that listens in a little creepy. But Tuttle says the company doesn’t store audio data from any conversations, and it only stores keyword terms it teases out if people allow it in their settings.

Dey, who is working to develop a wearable Bluetooth microphone that could analyze a person’s everyday interactions, imagines that such technology could one day process in-person conversations, not just phone calls made through an app.

Tuttle also expects his startup’s technology to evolve over several years into a “general-purpose conversation assistant.” He says Expect Labs has been approached by major mobile companies, as well as car companies, which are now turning their vehicles into software platforms (see “GM and Ford Open Up Their Vehicles to App Developers”).

“They know this is the future of how people will use their devices. The idea that we are going to have to type keywords into a search box to find information—they know that’s going to go away,” says Tuttle.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.