Facebook Nudges Users to Catalog the Real World

Taking aim at Google, the largest social network wants a database describing as many things as possible.

Tom Simonitearchive page

February 27, 2013

More than one billion people visit Facebook each month, mostly to see photos and messages posted by friends. Facebook hopes to encourage some of them to do a little work for it while they’re there. By asking people to contribute data—from business locations to book titles—and to check one another’s work, Facebook is building a rich stock of knowledge that could make its software smarter and boost the usefulness of its search engine.

Facebook's Mark Zuckerman — **Social circles:** Facebook CEO Mark Zuckerberg announced Graph Search at his company’s headquarters in Menlo Park, California.

“We’re trying to map what the real world looks like onto Facebook so you can run really expressive and powerful queries,” says Mitu Singh, product manager for Facebook’s entities team, a group charged with building a resource called the entity graph.

The entity graph is a little-known companion of Facebook’s famous social graph of around a billion people and 150 billion friend connections. The entity graph describes everything from the restaurants of New York to the concept of philosophy and the connections between those concepts. Singh and colleagues jokingly refer to their work on the entity graph as “project job security,” since mapping every entity in the world is a distant prospect.

That knowledge store is seen as vital to the ambitions of Facebook’s Graph Search service, unveiled last month and as yet available to only a fraction of the company’s users (see “Facebook’s Graph Search Isn’t That Great”). Unlike a conventional search engine, Graph Search is designed to understand the meaning of the phrases entered by a searcher and then deliver specific results such as people, places, books, or movies rather than just links to Web pages.

Facebook has already collected some information automatically by drawing on data from Wikipedia and other datasets. But, as it has become clearer that the resource would be crucial to ambitions such as search, a new emphasis has been put on finding ways to harness the site’s hundreds of millions of members as a kind of human indexing squad, a flesh and blood version of Google’s Web crawler. “Building classifiers and so on only gets you so far,” says Singh. “At some point you need people to help out.”

Singh’s team has loaded millions of entries into the entity graph by simply watching what people do on Facebook. Entities such as colleges and employers are learned from data typed into profile pages; businesses, movies, fictional characters, and other concepts are learned from fan pages created by Facebook users. Prompting people to tag duplicate content has taught Facebook’s entity graph the different ways people refer to the same thing—for example, that NPR and National Public Radio are the same thing. And analyzing many employment histories on the site allows Facebook’s search engine to know that a search for “software engineers” should also return people who say they are “coders.”

Facebook is now littered with tiny nudges to encourage people to contribute more directly, and these hints may get more direct as the new search function becomes more important. Pages for places such as museums or stores offer “suggest an edit” links that enable a person to change or add information such as opening hours, location, and phone number. Singh’s team is planning to roll out similar appeals for help on other pages, such as those for movies and books, says Singh.

Facebook’s most sophisticated tool for tapping into the collective knowledge of its users is an interface called the “places editor.” It guides users to fix errors in or add to the data for places in Facebook’s entity graph; in one mode it allows a person to rapidly click “yes” or “no” buttons to screen for duplicate entries. It might sound dull, but Facebook has found it to be surprisingly popular. “As with Wikipedia, what we’re seeing is people really passionate about their hometown or their current town,” says Singh. “They really like to make sure that it’s fixed and corrected.”

Facebook is using social cues to encourage more contributions. The places editor shows a running count of how many people a user’s edits have “helped,” defined as updated data another person has interacted with on Facebook. Clearing up whether a handful of Walgreens locations in San Francisco were the same or not was enough to be told that over 1,000 people had benefited, an impressive slug of psychological reinforcement. Messages sometimes appear to let people know which of their friends recently tidied up data. A similar tactic saw Facebook drive a spike in organ donor registrations across the U.S. in 2012 (see “What Facebook Knows”).

Facebook is not alone in thinking that a resource like the entity graph will be important to the future of search. Google last year revealed a project called Knowledge Graph, its own system for storing information about entities and their relationships (see “Google’s New Brain Could Have a Big Impact”), that’s currently used to return specific answers to some factual queries.

Both Google and Facebook’s graphs could ultimately improve more than search. Facebook’s entity graph is used to help guess at which topics a person would want to see in their feed of updates from friends. The two companies could also use their graphs to target advertisements.

Both projects can be seen as examples of the semantic Web, an evolutionary step in the history of online information predicted and worked on for over a decade by, amongst others, the W3C, the body that develops Web technology and standards. A core idea was to enable Web pages—and data stores—that allowed machines to understand the meaning of the text, images, and other data that humans look for on more conventional Web pages.

“The Web was designed to provide the backbone of the entity and knowledge graphs that Facebook and Google are building right now,” says Manu Sporny, who chairs the working group at the W3C concerned with RDFa, a technology used—including by Google and Facebook—to add semantic data to Web pages.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.