Skip to Content
Uncategorized

Google, Microsoft, and Yahoo Team Up to Advance Semantic Web

A push to add meaning to Web pages to aid search could also enable other kinds of intelligent web apps.
June 10, 2011

Google, Microsoft, and Yahoo have teamed up to encourage Web page operators to make the meaning of their pages understandable to search engines.

Web of words: This graph of linked phrases lets software understand the meaning of online content. The system is backed by Google, Microsoft and Yahoo.

The move may finally encourage widespread use of technology that makes online information as comprehensible to computers as it is to humans. If the effort works, the result will be not only better search results, but also a wave of other intelligent apps and services able to understand online information almost as well as we do.

The three big Web companies launched the initiative, known as Schema.org, last week. It defines an interconnected vocabulary of terms that can be added to the HTML markup of a Web page to communicate the meaning of concepts on the page. A location referred to in text could be defined as a courthouse, which Schema.org understands as being a specific type of government building. People and events can also be defined, as can attributes like distance, mass, or duration. This data will allow search engines to better understand how useful a page may be for a given search query—for example, by making it clear that a page is about the headquarters of the U.S. Department of Defense, not five-sided regular shapes.

The move represents a major advance in a campaign initiated in 2001 by Tim Berners-Lee, the inventor of the Web, to enable software to access the meaning of online content—a vision known as the “semantic Web.” Although the technology to do so exists, progress has been slow because there have been few reasons for Web page operators to add the extra markup.

Schema.org  may change that, says Dennis McCleod, who works on semantic Web technology at the University of Southern California. By tagging information, Web page owners could improve the position of their site in search results—an  important source of traffic. “This will motivate people to actually add semantic data to their pages,” says McCleod. “It’s always hard to predict what will be adopted, but generally, unless there’s something in it for people, they won’t do it. Google, Microsoft, and Yahoo have given people a strong reason.”

The Schema.org approach is modeled on one of the more straightforward methods of describing the meaning of a Web page’s contents. “The trouble with many of these techniques is, they are really hard to use,” says McCleod. “One of the encouraging things about Schema.org is that they are pursuing this at a level that is quite usable, so it is much easier to mark up your website.”

If many Web page owners act on Schema.org’s suggestions, more than just search will benefit. “This data can be used by any software to cross-correlate things that are related, or to understand the relationship between information from different sources,” says McCleod. For example, widespread availability of semantic information might improve artificially intelligent assistants, such as Siri (bought last year by Apple). Or tools able to make good recommendations of, say, news articles because they can know for sure what stories are referring to.

However, the companies behind Schema.org made their move unilaterally, without consulting the World Wide Web consortium (W3C), the standards body for Web technology. “We had no idea this was coming,” says Manu Sporny, a member of the W3C’s Semantic Web Coordination Group.

Schema.org asks for semantic markup to be written using a format known as microdata, which is not yet a W3C standard, rather than RDFa, a more widely used W3C-approved alternative.

Google has warned that its “crawlers” that roam the Web to build its index could be confused by a page using both microdata and RDFa. Yet Microsoft has previously said its own crawlers have no such problems, says Sporny.

If that confusion isn’t straightened out, he says, microdata may become the only standard used at any scale, which would limit the power of the semantic Web, because the alternative can do much more. “RDFa supports use cases that microdata can’t—for example, the WHO publishing mortality rates for different countries or adding semantic information to eBook or image files,” he says.

Sporny hopes that Google and others behind Schema.org will modify their stance on formats. But he acknowledges that having such large companies embrace the semantic approach is a good thing. “They are saying you will get better results with semantic Web concepts,” says Sporny, “and if they encourage more sites to embrace the semantic Web, that will help all kinds of other applications, too.”

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.