Google, Microsoft, and Yahoo have teamed up to encourage Web page operators to make the meaning of their pages understandable to search engines.
The move may finally encourage widespread use of technology that makes online information as comprehensible to computers as it is to humans. If the effort works, the result will be not only better search results, but also a wave of other intelligent apps and services able to understand online information almost as well as we do.
The three big Web companies launched the initiative, known as Schema.org, last week. It defines an interconnected vocabulary of terms that can be added to the HTML markup of a Web page to communicate the meaning of concepts on the page. A location referred to in text could be defined as a courthouse, which Schema.org understands as being a specific type of government building. People and events can also be defined, as can attributes like distance, mass, or duration. This data will allow search engines to better understand how useful a page may be for a given search query—for example, by making it clear that a page is about the headquarters of the U.S. Department of Defense, not five-sided regular shapes.
The move represents a major advance in a campaign initiated in 2001 by Tim Berners-Lee, the inventor of the Web, to enable software to access the meaning of online content—a vision known as the “semantic Web.” Although the technology to do so exists, progress has been slow because there have been few reasons for Web page operators to add the extra markup.
Schema.org may change that, says Dennis McCleod, who works on semantic Web technology at the University of Southern California. By tagging information, Web page owners could improve the position of their site in search results—an important source of traffic. “This will motivate people to actually add semantic data to their pages,” says McCleod. “It’s always hard to predict what will be adopted, but generally, unless there’s something in it for people, they won’t do it. Google, Microsoft, and Yahoo have given people a strong reason.”
The Schema.org approach is modeled on one of the more straightforward methods of describing the meaning of a Web page’s contents. “The trouble with many of these techniques is, they are really hard to use,” says McCleod. “One of the encouraging things about Schema.org is that they are pursuing this at a level that is quite usable, so it is much easier to mark up your website.”