As a global resource built from the spare time of millions of volunteers, Wikipedia may be the epitome of Web 2.0. But the Wikimedia Foundation, a nonprofit organization that runs Wikipedia, among other projects, is now thinking about how to make it a linchpin of Web 3.0, or the semantic Web.
That means making some of the data on Wikipedia’s 15 million (and counting) articles understandable to computers as well as humans. This would allow software to know, for example, that the numbers shown in one of the columns in this table listing U.S. presidents are dates. That could, in turn, allow applications that draw on Wikipedia to automatically generate historical timelines or answer the kind of general knowledge questions that would usually entail a person finding and reading a relevant entry on the site.
At the 2010 Semantic Technology conference in San Francisco last month, the foundation’s deputy director, Erik Möller, and colleague Trevor Parscal, a user-experience developer for Wikimedia, showed some first steps taken by the foundation to explore how more semantic structure might be added to Wikipedia. They also appealed to the semantic Web community to help develop ways to make Wikipedia’s knowledge more accessible to computers and software.
“Semantic information already exists in Wikipedia, and people are already building on it,” says Möller. “Unfortunately, we’re not really helping, and they have to use extensive processing to do so.”
One example is DBPedia, a semantic database built using software collect data from the site’s pages, and maintained by the Free University of Berlin and the University of Leipzig, both in Germany. Another is Freebase, a for-profit knowledge database, much of which was also sourced by scraping Wikipedia. Freebase is the data source used by question-answering search engine PowerSet, which was acquired by Microsoft to be part of its Bing search engine.
The first targets for Möller and Parscal are the “infoboxes” that appear as summaries on many Wikipedia pages, and the tables in entries, such as this one showing the gross national product of all the countries in the world.
“Just being able to reuse that data within Wikipedia would be a big thing,” says Yaron Koren, who runs a consultancy that specializes in Semantic MediaWiki, an extension to the MediaWiki software used to build Wikipedia. “The manual work that goes into maintaining the many tables and lists today could be eliminated,” he adds. Instead, lists could be automatically generated from the infoboxes of other pages. It would also be possible to generate maps, using the location coordinates that feature on some pages, or automatically generate timelines to summarize periods in history covered by many other pages, says Möller.