Opening Search to Semantic Upstarts
Yahoo’s new open-search platform is giving semantic search a helping hand.
Even if you have a great idea for a new search engine, it’s far from easy to get it off the ground. For one thing, the best engineering talent resides at big-name companies. Even more significantly, according to some estimates, it costs hundreds of millions of dollars to buy and maintain the servers needed to index the Web in its entirety.
However, Yahoo recently released a resource that may offer hope to search innovators and entrepreneurs. Called Build Your Own Search Service (BOSS), it allows programmers to make use of Yahoo’s index of the Web–billions of pages that are continually updated–thereby removing perhaps the biggest barrier to search innovation. By opening its index to thousands of independent programmers and entrepreneurs, Yahoo hopes that BOSS will kick-start projects that it lacks the time, money, and resources to invent itself. Prabhakar Raghavan, head of Yahoo Research and a consulting professor at Stanford University, says this might include better ways of searching videos or images, tools that use social networks to rank search results, or a semantic search engine that tries to understand the contents of Web pages, rather than just a collection of keywords and links.
“We’re trying to break down the barriers to innovation,” says Raghavan, although he admits that BOSS is far from an altruistic venture. If a new search-engine tool built using Yahoo’s index becomes popular and potentially profitable, Yahoo reserves the right to place ads next to its results.
So far, no BOSS-powered site has become that successful. But a number of startups are beginning to build their services on top of BOSS, and Semantic Web companies, in particular, are benefiting from the platform. These companies are developing software to process concepts and meanings in order to better organize information on the Web.
For instance, Hakia, a company based in New York, began building a semantic search engine in 2004. Its algorithms use a database of concepts–people, places, objects, and more–to “understand” concepts in documents. Hakia also creates maps linking together different documents, such as Web pages, based on these concepts in order to understand their relevance to one another. Riza Berkan, CEO of the company, says that focusing on the meaning of pages, instead of simply on the links between them, could serve up more relevant search results and help people find content that they didn’t even know they were looking for.
However, in order to do this well, Hakia needs to have access to as many Web pages as possible, and this is where BOSS fits in. For a given query, Hakia uses Yahoo’s BOSS index to determine a set of relevant results. Hakia’s software then determines whether these pages have already been analyzed by the company’s semantic software. If they haven’t, they will be processed, and the results will be stored on Hakia’s servers. “We crawl the Web anyway,” says Berkan. “But without Yahoo’s index, we’d be behind on the sites that people are searching for today.” And the more popular pages Hakia scans, the better its index will be.
Another semantic startup, called Cluuz, from Ontario, Canada, is taking a slightly different approach. When a user searches with Cluuz, she will see Yahoo BOSS results, but they are reordered according to the startup’s own semantic search technology. “When you do a query,” says Alex Zivkovic, CTO of Cluuz, “we pass it on to Yahoo BOSS, and we get a list of results back … Then for each of those pages, the Cluuz engine analyzes the content, extracts entities–people, companies, phone numbers, and those sorts of things.” These concepts, he explains, are then checked against the concepts found on other pages, and the concepts that arise most often are deemed most relevant.
“Instead of looking at pages being linked based on the physical links, we’re looking at them in terms of whether or not they are talking about the same concepts,” says Zivkovic. This leads to a different user experience, he adds. For instance, terms relevant to a search query are pulled from the Web and highlighted on the right of the results page. A search for “Kate Greene” immediately pulls up my e-mail address at Technology Review, the university I attended, and a number of the people I’ve interviewed for past stories. Additionally, Cluuz provides other tools that allow the links and relationships between different semantic concepts to be visualized easily.
Even with the power of Yahoo’s index behind a company, there’s no guarantee that Hakia or Cluuz will be a success. But if they do take off, it could help Yahoo, which still lags way behind Google in terms of popularity, regain the edge. “The underlying philosophy [with BOSS] is, we’re not going to be able to invent everything on our own,” says Raghavan. “So we should facilitate innovation.”