Dewey envisioned all human knowledge as falling along a spectrum whose order could be represented numerically. Even if arbitrary, his system gave context to library searches; when seeking a book on Greek history, for example, a researcher could be assured that other relevant texts would be nearby. A book’s location on the shelves, relative to nearby books, itself aided scholars in their search for information.
As the Web gained ground in the early 1990s, it naturally drew the attention of Miller and the other latter-day Deweys at OCLC. Young as it was, the Web was already outgrowing attempts to categorize its contents. Portals like Yahoo forsook topic directories in favor of increasingly powerful search tools, but even these routinely produced irrelevant results.
Nor was it just librarians who worried about this disorder. Companies like Netscape and Microsoft wanted to lead their customers to websites more efficiently. Berners-Lee himself, in his original Web outlines, had described a way to add contextual information to hyperlinks, to offer computers clues about what would be on the other end.
This idea had been dropped in favor of the simple, one-size-fits-all hyperlink. But Berners-Lee didn’t give it up altogether, and the idea of connecting data with links that meant something retained its appeal.
On the Road to Semantics
By the mid-1990s, the computing community as a whole was falling in love with the idea of metadata, a way of providing Web pages with computer-readable instructions or labels that would be invisible to human readers.
To use an old metaphor, imagine the Web as a highway system, with hyperlinks as connecting roads. The early Web offered road signs readable by humans but meaningless to computers. A human might understand that “FatFelines.com” referred to cats, or that a link led to a veterinarian’s office, but computers, search engines, and software could not.
Metadata promised to add the missing signage. XML–the code underlying today’s complicated websites, which describes how to find and display content–emerged as one powerful variety. But even XML can’t serve as an ordering principle for the entire Web; it was designed to let Web developers label data with their own custom “tags”–as if different cities posted signs in related but mutually incomprehensible dialects.
In early 1996, researchers at the MIT-based World Wide Web Consortium (W3C) asked Miller, then an Ohio State graduate student and OCLC researcher, for his opinion on a different type of metadata proposal. The U.S. Congress was looking for ways to keep children from being exposed to sexually explicit material online, and Web researchers had responded with a system of computer-readable labels identifying such content. The labels could be applied either by Web publishers or by ratings boards. Software could then use these labels to filter out objectionable content, if desired.
Miller, among others, saw larger possibilities. Why, he asked, limit the descriptive information associated with Web pages to their suitability for minors? If Web content was going to be labeled, why not use the same infrastructure to classify other information, like the price, subject, or title of a book for sale online? That kind of general-purpose metadata–which, unlike XML, would be consistent across sites–would be a boon to people, or computers, looking for things on the Web.