Late last month, Google’s search engine got significantly smarter.
A store of information dubbed the “Knowledge Graph” now adds useful context and detail to the list of links that Google serves up. Searching for certain people, places, or things produces a box of facts alongside the regular results. The Knowledge Graph is already starting to appear in a few other Google products, and could be used to add intelligence to all of the company’s software.
“Search was mostly based on matching words and phrases, and not what they actually mean,” says Shashidar Thakur, the tech lead for the Knowledge Graph in Google’s search team. Thakur says the project was invented to change that.
The Knowledge Graph can be thought of as a vast database that allows Google’s software to connect facts on people, places, and things to one another. Google got the Knowledge Graph project started when it bought a startup called Metaweb in 2010; at that time, the resource contained only 12 million entries. Today it has more than 500 million entries, with more than 3.5 billion links between them.
Such a stock of knowledge about the world should have uses beyond just helping people who are searching for facts online. Thakur says that the Knowledge Graph has already been plugged into YouTube, where it is being used to organize videos by topic and to suggest new videos to users, based on what they just watched. It could also be used to connect and recommend news articles based on the specific facts mentioned in stories, says Thakur. “Knowledge Graph is a very general resource; it’s like a ground truth we can refer to.”
When a person searches on Google, the conventional results are based on algorithms that look for matches with the terms rather than the meaning of the information entered into the search box. Google’s algorithms first refer to data from past searches to determine which words in the query string are most likely to be important (based on how often they have been used by previous searchers). Next, software accesses a list of Web pages known to contain information related to those terms—known as reverse indexes. Finally, another calculation is used to rank the results shown to the searcher. With luck, what they’re looking for will be found somewhere in those pages.
Google’s new approach, made possible through the Knowledge Graph, is to try to interpret what a person is asking about in a much more sophisticated way and directly retrieve relevant information.
However, data from past searches is still used to determine what information is most relevant. For example, people often add the word “cast” when searching for TV shows, so the actors in a series are usually listed when it surfaces from the Knowledge Graph. “It’s a learning process,” says Thakur. “The queries that people are doing tell us what people are interested in.” This also helps Google figure out new links between concepts in the Knowledge Graph. Both the number of entries and links between them are growing fast, says Thakur, although he declined to say just how rapidly.
Thakur wouldn’t say where the Knowledge Graph will pop up next, but the technology seems likely to appear across Google’s many products. Web pioneers such as Tim Berners-Lee have long talked up the idea of a “semantic Web,” where software can process the meaning of online information, and Knowledge Graph seems a significant step toward this vision.
However, Kingsley Idehen, founder of semantic technology company OpenLink Software, says that Knowledge Graph is not really helping advance the semantic Web because it is not openly accessible—despite being compiled using open data sources such as Wikipedia and Freebase. If Google were to open up its Knowledge Graph for others to use, then the Web as a whole could get much smarter, says Idehen.
“They’ve released a deliberately closed solution,” he says, contrasting that with Facebook’s own knowledge store known as the Open Graph, a public resource that software can use to access information on music, movies, recipes, and more. Such open solutions, says Idehen, “actually contribute to evolving the Web into a global data space by exposing the keys to their records.”
A Google spokesperson wouldn’t say if Knowledge Graph would be opened up, but pointed out that some of what it contains is freely available for access by humans and software at Freebase, a site created by Metaweb before it was acquired by Google. However, Freebase is simpler than the Knowledge Graph, which Google is continuing to make smarter. Google is also one of the main funders of the WikiData project from the foundation behind Wikipedia. It is aimed at creating a store of machine-accessible knowledge that could become very large, if it takes off like Wikipedia.
Back at Google, Thakur says that his current priority is to find ways to use the Knowledge Graph to answer more complex questions—some of which seem similar to those tackled by “knowledge engine” Wolfram Alpha. “Right now what we have is answering questions about entities, but there are harder questions,” he says. “For example: ‘volcanoes that exploded in the eighteenth century,’ or ‘movies based on books.’ “