It’s an enormous undertaking. The first step is to establish standards that allow users to add explicit descriptive tags, or metadata, to Web content-making it easy to pinpoint exactly what you’re looking for. Next comes developing methods that enable different programs to relate and share metadata from different Web sites. After that, people can begin crafting additional features, like applications that infer additional facts from the ones they’re given. As a result, searches will be more accurate and thorough, data entry will be streamlined and the truthfulness of information will be easier to verify. At least that’s the goal.
Many feel it can’t be done. Even though things are heating up in research labs, the Semantic Web as envisioned by Berners-Lee is hampered by social and technical challenges that some critics say may never be solved. But that’s not stopping the World Wide Web Consortium and other organizations from trying. The U.S. Defense Advanced Research Projects Agency (DARPA) and commercial enterprises such as Network Inference in Manchester, England, are already developing tools for building the Semantic Web infrastructure-as well as applications for using it. And according to Berners-Lee, with growing numbers of people beginning to grasp how the Semantic Web will “allow more and more sophisticated agents to do things on their behalf,” we’ll soon see some glimmers of what could be in store.
Untangling the Semantic Web
In his crowded office on the third floor of MIT’s Laboratory for Computer Science building, research scientist Eric Miller doesn’t seem bothered by the pounding and grinding noises coming from heavy equipment on the construction site next door. As the head of the Semantic Web project, the friendly and energetic Miller is too enthralled with his new job to notice. “I’m the luckiest guy alive,” he says. “I get paid for what I’d do for free.”
Berners-Lee tapped Miller to head up the consortium’s Semantic Web Activity because of Miller’s involvement in Web-based knowledge management projects and his ability to enthusiastically articulate the concepts behind the Semantic Web. Standing next to a whiteboard covered in diagrams of metadata in action, Miller explains that the fundamental idea behind the Semantic Web is to make the Internet more useful to people by making the information floating all over the Web more easily manipulated by computers.
Today, by contrast, most content is formatted for human consumption. When you read a news article online, for instance, you can easily pick out the headline, byline, dateline, photo credit and so on. But unless these things are explicitly labeled, a computer has no idea what they are. It simply sees a bunch of text. In the Semantic Web, a news story will be marked with labels that describe its various parts, making it easy, among other things, for a search engine to find articles written by Jimmy Carter and not stories written about him.
That’s not possible today, at least not on a global scale. The formatting tags used to create Web pages are part of the hypertext markup language (HTML), and they describe only what a Web page’s information looks like (boldface, small, large, underlined, etc.). The Semantic Web would go beyond cosmetics by including tags that also describe what the information is: tags would label text as designating, for instance, subject, author, street address, price or shipping charge. These descriptive tags are the metadata-the data about the data. Metadata is not a new concept, nor one restricted to the Internet. A library’s card catalogue-with its records describing a book’s title, author, subject, year and location on the shelves-is metadata.
The Web made it trivially easy to exchange documents between previously incompatible computers (a few of today’s Web users may recall the headaches of the 1980s, when computers from different makers were electronic islands). The Semantic Web will take this a step further, making it possible for computers to exchange particular pieces of information from within documents.