Google, Microsoft, and Yahoo have teamed up to encourage Web page operators to make the meaning of their pages understandable to search engines.
The move may finally encourage widespread use of technology that makes online information as comprehensible to computers as it is to humans. If the effort works, the result will be not only better search results, but also a wave of other intelligent apps and services able to understand online information almost as well as we do.
The three big Web companies launched the initiative, known as Schema.org, last week. It defines an interconnected vocabulary of terms that can be added to the HTML markup of a Web page to communicate the meaning of concepts on the page. A location referred to in text could be defined as a courthouse, which Schema.org understands as being a specific type of government building. People and events can also be defined, as can attributes like distance, mass, or duration. This data will allow search engines to better understand how useful a page may be for a given search query—for example, by making it clear that a page is about the headquarters of the U.S. Department of Defense, not five-sided regular shapes.
The move represents a major advance in a campaign initiated in 2001 by Tim Berners-Lee, the inventor of the Web, to enable software to access the meaning of online content—a vision known as the “semantic Web.” Although the technology to do so exists, progress has been slow because there have been few reasons for Web page operators to add the extra markup.
Schema.org may change that, says Dennis McCleod, who works on semantic Web technology at the University of Southern California. By tagging information, Web page owners could improve the position of their site in search results—an important source of traffic. “This will motivate people to actually add semantic data to their pages,” says McCleod. “It’s always hard to predict what will be adopted, but generally, unless there’s something in it for people, they won’t do it. Google, Microsoft, and Yahoo have given people a strong reason.”
The Schema.org approach is modeled on one of the more straightforward methods of describing the meaning of a Web page’s contents. “The trouble with many of these techniques is, they are really hard to use,” says McCleod. “One of the encouraging things about Schema.org is that they are pursuing this at a level that is quite usable, so it is much easier to mark up your website.”
If many Web page owners act on Schema.org’s suggestions, more than just search will benefit. “This data can be used by any software to cross-correlate things that are related, or to understand the relationship between information from different sources,” says McCleod. For example, widespread availability of semantic information might improve artificially intelligent assistants, such as Siri (bought last year by Apple). Or tools able to make good recommendations of, say, news articles because they can know for sure what stories are referring to.
However, the companies behind Schema.org made their move unilaterally, without consulting the World Wide Web consortium (W3C), the standards body for Web technology. “We had no idea this was coming,” says Manu Sporny, a member of the W3C’s Semantic Web Coordination Group.
Schema.org asks for semantic markup to be written using a format known as microdata, which is not yet a W3C standard, rather than RDFa, a more widely used W3C-approved alternative.
Google has warned that its “crawlers” that roam the Web to build its index could be confused by a page using both microdata and RDFa. Yet Microsoft has previously said its own crawlers have no such problems, says Sporny.
If that confusion isn’t straightened out, he says, microdata may become the only standard used at any scale, which would limit the power of the semantic Web, because the alternative can do much more. “RDFa supports use cases that microdata can’t—for example, the WHO publishing mortality rates for different countries or adding semantic information to eBook or image files,” he says.
Sporny hopes that Google and others behind Schema.org will modify their stance on formats. But he acknowledges that having such large companies embrace the semantic approach is a good thing. “They are saying you will get better results with semantic Web concepts,” says Sporny, “and if they encourage more sites to embrace the semantic Web, that will help all kinds of other applications, too.”