Updating Search

We need to reimagine the role of search engines and their sources of data.

Shashi Setharchive page

April 20, 2010

Real-time search means retrieving information about what’s happening, everywhere, now. The amount of real-time data that’s available is growing rapidly with the proliferation of mobile devices. At Yahoo, we have already begun to incorporate real-time search results from Twitter and sources of developing news. But the scope of real-time data reaches far beyond tweets and Facebook updates. For example, users are uploading photos on Flickr to show what’s happening around them, chatting about the latest news, and answering questions live on sites like Yahoo Answers. That’s just the beginning of the real-time information that can be made available to search engines (see “TR10: Real-Time Search” ).

The sheer amount of real-time data presents unique challenges for search. Because a lot of the data is nonauthoritative, noisy, or spammy, search engines need to build trust models that can determine what data is important and influential. For example, retweets are not often useful results, and some data providers carry more authority than others. Search engines must also determine the right balance between timeliness and relevance to each user. Further, real-time data needs to be indexed and updated instantaneously. A few years ago, search engines took several hours to index. Today, they take only a few seconds–but they need to become even faster.

With the challenges of using real-time data come some exciting possibilities for reimagining search. As in the early days of the Web, when Yahoo built a directory to identify authoritative sites, we are seeing search engines building better trust models. Aggregators are emerging to qualify the reputations of sources. Many other types of self-organization are possible in this new realm.

We can imagine that to speed up the rate at which search engines are able to share real-time data, some sources will inform their indexes when something is happening. Rather than just waiting for search engines to crawl a site, users can push relevant new information. Say you are looking for a parking space in busy downtown San Francisco: parking lots might send updates to search engine indexes as spaces become available.

Already, we know that real-time search can serve needs other than those of traditional Web search. The resulting data can be invaluable in answering long-tail queries–those that aren’t related to the most popular topics. The potential uses of real-time search are limitless. Finding–and inventing–new uses will make search even more valuable in our daily lives.

Shashi Seth is senior vice president of search products at Yahoo.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.