How a Database of the World’s Knowledge Shapes Google’s Future
Compiling a giant database of all the facts in the world could help Google’s future products understand you better.
For all its success, Google’s famous Page Rank algorithm has never understood a word of the billions of Web pages it has directed people to over the years. That’s why in 2010 Google acquired Metaweb, a company building a database intended to give computers the ability to understand the world. Two years later the company’s technology resurfaced as the Knowledge Graph (see “Google’s New Brain Could Have a Big Impact”). John Giannandrea, vice president of engineering at Google and a Metaweb cofounder, says that will lead to Google’s future products being able to truly understand the people who use them and the things they care about. He told MIT Technology Review’s Tom Simonite how a data store designed to link together all the knowledge on Earth might do that.
What is the Knowledge Graph?
It’s a distillation of what Google knows about the world. An analogy I often use is maps. For a maps product you have to build a database of the real world and know there are things called streets, rivers, and countries in the physical world. That’s creating a symbolic structure for the physical world; the Knowledge Graph does that for the world of ideas and common sense. We have entities in the knowledge graph for foods, recipes, products, ideas in philosophy or history, and famous people. We can have relationships between them, so we can say these two people are married or this place is in this country or we can say this movie is related to this person.
How does that make a difference to Google’s Web search?
We’ve gone up a level from just talking about the words to talking about what the thing actually is. In crawling and indexing documents we can now have an understanding of what the document is about. If the document is about famous tennis players we actually know it’s about sport and tennis. Every piece of information that we crawl, index, or search is analyzed in the context of Knowledge Graph. That’s not the same as completely understanding the text as you and I might do but it’s a step towards it.
We can now do question answering on Google.com, for example you can search for “How old is Barack Obama?” We’re also doing things related to exploration. We have a feature called the carousel for exploring categories of entities, so if you type in “London bridges” it will show you a bunch of bridges.
Being able to understand what people are searching for will, of course, help you target search ads. But does Knowledge Graph have uses beyond search?
Inside Google the Knowledge Graph is a piece of infrastructure and it’s getting larger and broader and deeper all the time. It’s a cross-company effort. Almost all the structured data from all of our products like Maps and Finance and Movies and Music are all in the Knowledge Graph, so we can reasonably say that everything we know about is in this canonical form. It lets our product people in all parts of the company be more ambitious.
As a general theme we’re trying to move beyond just searching to actually knowing about things. We think this is essential because we want to understand what you’re trying to do and give you some help. Google Now is an example of a product that is trying to figure out the state that you’re in and make a suggestion to you. To do that effectively you need to have [an understanding] of people, and that that they take trips, and that trips on airplanes can be delayed.
One of the main areas is to try and understand at a slightly higher level what text is about. Words that you see in a text are fundamentally ambiguous [to a computer] but if you have Knowledge Graph and can understand how the words are related to each other, then you can disambiguate them. If you see a document that talks about George Bush, Saddam Hussein, and Norman Schwarzkopf, you might be able to guess which Bush it is because only one of them had Norman Schwarzkopf there. That’s like a baby step towards actually understanding what this document is about.
Is the Knowledge Graph complete yet?
It’s growing every second. If a local business updates their opening hours with Google that data will find its way into the Knowledge Graph, for example, and there are algorithms looking at changes in many public websites, such as Wikipedia. We basically take all this raw data and filter it to decide our confidence level and whether to change the Graph. If a famous person dies, for example, we notice and the Knowledge Graph is updated.
People have proposed building these kinds of representations of common sense before in artificial intelligence. I think the thing that distinguishes Knowledge Graph is that it’s a very large and practical implementation of that. The scale and accuracy of the Knowledge Graph is probably unique in history.
What about subjective information, like whether a restaurant is romantic?
This is an ongoing area of work but the Knowledge Graph does contain some subjective data. Sometimes we can look at words, for example this restaurant is known for X, Y, or Z. Genres in general are hard and music genres even harder because people don’t agree what they are. But most databases would have an attempt at listing genre and we can draw on that.
Why does Knowledge Graph look different from the vision of the semantic Web developed by Tim Berners-Lee and others?
The original semantic Web idea was that people with data would emit it in standard formats and then some search engine like Google would come along and aggregate it and provide all kinds of wonderful services. That powerful idea of teaching computers about the world of knowledge wasn’t happening fast enough, and we wanted to get it started by gathering a critical mass of stuff. We recognize that we don’t have all the data in the world, but we think this model is useful. We still operate a public website for Freebase where people can contribute data to the open source database and Google provides public APIs to access it. Usage and contribution to Freebase is growing.