A World Wide Web that Talks

IBM builds a search engine aimed at the estimated fifth of the world’s population that cannot read.

Tom Simonitearchive page

February 16, 2011

Some 10,000 people worldwide use a version of the Web like no other: it is operated by voice over the telephone. Called the “Spoken Web,” it is the result of an IBM research project attempting to re-create the features and functions of the text-based World Wide Web for people with low levels of literacy and technical skills.

Four years since the first prototype was released, the spoken Web is part of everyday life for users in four Indian states and parts of Thailand and Brazil. These people use it to learn of things such as local grain prices or job opportunities. On the spoken Web, telephone numbers replace Web addresses. A person can call in to a voice site and listen to or record content.

Now the project is going through a developmental stage that mirrors part of the regular Web’s history: the debut of search as a way to navigate a growing body of content.

“As the number of voice sites grows, and they get more content, people need a way to find what they want quickly,” says Nitendra Rajput, a senior researcher with IBM Research India. Rajput was an early collaborator on the spoken Web with project founder Arun Kumar.

A voice site has some structure: for example, a person who calls in to upload a site will interact with an automated telephone system that accepts voice commands and prompts the user to create a title and add sections of different information. However, listening to long voice messages is inefficient and costly, says Rajput.

“We want you to be able to speak a pesticide name, for example, to quickly find content about that,” he says. But designing a search engine that works that way is far from simple. Though voice-recognition technology can be used to take a person’s search term and match it against a previously processed index of recorded voice sites, presenting the results is a challenge. “We can’t have it read out a list of 20 results. It would take too long, and people would not remember them all,” says Rajput. “Instead it [must] tell the user it has that many, and ask how to narrow them down.”

The user is asked which categories they wish to filter the results by—for example, by the name of the person who owns the site, the place it was created, or whether the search term was found in a section of a particular type, for example announcing news, or asking or answering a question. This step is repeated until there are five or fewer results, at which point they are all read out to the user who can choose which they want to “browse” to.

Trials involving 40 farmers in the Indian state of Gujarat validated this design, which is to be rolled out across the whole spoken Web. More features that aid navigation of content are needed, though. As the spoken Web grows, it becomes important to find more ways to aid navigation of content, says Rajput, just as similar mechanisms have been developed on the text-based Web.

Another improvement in the works provides a way to skim through voice sites. Users can already use a fast-forward function to hear a site at increased speed—the feature goes at 10 times normal speed, rendering words too fast to make out, but it slows down for certain important words or phrases. The effect is similar to skim-reading a text out loud, says Rajput, and it allows users to very rapidly find what they want.

The researchers think the system could learn which words or phrases are important by looking at which particular phrases lead users to switch from fast-forward to normal reading speed. “We are currently collecting the statistics from the users we have in order to know which words are important,” says Rajput.

“So many people in the world have no idea how to use the Web or even to understand the text on it,” says Naushad UzZaman, a researcher at the University of Rochester in New York. “Although you cannot remove the digital divide, making it possible to get the benefits of the Web by voice is an example of how we can narrow it.”

UzZaman says technology should ideally be able to create a path that connects people with low iteracy skills to at least some of the wider Web’s features. He has prototyped a system that digests online text into much simpler sentences that convey the same essential meaning. Images are placed alongside the text to help users get the gist. A prototype works well when tested on Wikipedia pages, says UzZaman.

Rajput says that for now, IBM’s spoken Web is completely separate from the World Wide Web, and that most users are mainly interested in local concerns. However, that could change. “If there is relevant information on the real Web, we can pull it in to the spoken Web using API calls and text-to-speech technology,” says Rajput. “But it needs to be converted to the correct language, and support for that is not good outside U.S. English.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.