Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

As a global resource built from the spare time of millions of volunteers, Wikipedia may be the epitome of Web 2.0. But the Wikimedia Foundation, a nonprofit organization that runs Wikipedia, among other projects, is now thinking about how to make it a linchpin of Web 3.0, or the semantic Web.

That means making some of the data on Wikipedia’s 15 million (and counting) articles understandable to computers as well as humans. This would allow software to know, for example, that the numbers shown in one of the columns in this table listing U.S. presidents are dates. That could, in turn, allow applications that draw on Wikipedia to automatically generate historical timelines or answer the kind of general knowledge questions that would usually entail a person finding and reading a relevant entry on the site.

At the 2010 Semantic Technology conference in San Francisco last month, the foundation’s deputy director, Erik Möller, and colleague Trevor Parscal, a user-experience developer for Wikimedia, showed some first steps taken by the foundation to explore how more semantic structure might be added to Wikipedia. They also appealed to the semantic Web community to help develop ways to make Wikipedia’s knowledge more accessible to computers and software.

“Semantic information already exists in Wikipedia, and people are already building on it,” says Möller. “Unfortunately, we’re not really helping, and they have to use extensive processing to do so.”

One example is DBPedia, a semantic database built using software collect data from the site’s pages, and maintained by the Free University of Berlin and the University of Leipzig, both in Germany. Another is Freebase, a for-profit knowledge database, much of which was also sourced by scraping Wikipedia. Freebase is the data source used by question-answering search engine PowerSet, which was acquired by Microsoft to be part of its Bing search engine.

The first targets for Möller and Parscal are the “infoboxes” that appear as summaries on many Wikipedia pages, and the tables in entries, such as this one showing the gross national product of all the countries in the world.

“Just being able to reuse that data within Wikipedia would be a big thing,” says Yaron Koren, who runs a consultancy that specializes in Semantic MediaWiki, an extension to the MediaWiki software used to build Wikipedia. “The manual work that goes into maintaining the many tables and lists today could be eliminated,” he adds. Instead, lists could be automatically generated from the infoboxes of other pages. It would also be possible to generate maps, using the location coordinates that feature on some pages, or automatically generate timelines to summarize periods in history covered by many other pages, says Möller.

9 comments. Share your thoughts »

Tagged: Web, search, Web 2.0, semantic, Wikipedia

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me