Möller says an example of the kind of services that could be enabled is WikiPics, developed by Daniel Kinzler at the German Wikimedia foundation. Kinzler scraped a database of all the links that connect different Wikipedia pages available in multiple languages and built a fully multilingual image search. When a user puts in the term “horse,” for example, the service knows to also find images of “cheval” (French) and “Pferd” (German). “You’re searching concepts instead of terms,” says Möller. However, for now the site relies on the slow process of scraping the whole of Wikipedia to update its knowledge. A semantic Wikipedia would maintain a live database that could be queried at any time.
Wikipedia faces two big challenges in embracing semantic concepts, says Möller. One is that no one has yet built a semantic web service on the scale of a site such as Wikipedia, and it is unclear whether existing software like Semantic MediaWiki is up to the task, he says.
A second challenge is the feature of Wikipedia most responsible for its success so far: its community. “Thinking about adding semantic structure is a natural extension of what Wikipedia needs to do, given prevailing trends,” says Andrew Lih of the University of Southern California, and author of the 2009 book The Wikipedia Revolution. “But I do worry a bit about the database aspect that comes with this–the attraction of wikis in the first place is in the way they have been hand-edited by humans.”
Parscal has been leading efforts to make it easy for anyone to add or edit the data of a large semantic store. “We’ve been working on a visual editor that suggests how we might help users contribute structured data, and that also makes the editing process easier,” says Parscal.
Editing Wikipedia today is already a daunting process that needs improvement, admits Parscal. “If you’ve interacted with our interface,” he explains, “you’ve been slapped in the face by wikitext” (a markup language that uses special code around text to format things like links, references, and section headings). The wikitext for tables or infoboxes–the information most ripe for making semantic–is particularly dense and hard to understand, says Parscal. “We recently did some user experience studies with people that hadn’t used it before; they were quickly quite frustrated.”
In future, it may be possible to remove the need for a human to populate some parts of Wikipedia altogether, says Möller. “Fundamentally a lot of this data probably shouldn’t be entered by humans in the first place, it should just, say, poll the source of a figure like GDP once a year.” That’s a capability that Koren has already added to Semantic MediaWiki, through an extension called ExternalData.