Creating the world wide web didn’t make Tim Berners-Lee instantly rich or famous. In part, that’s because the Web sprang from relatively humble technologies. Berners-Lee’s invention was based on an information retrieval program called Enquire (named after a Victorian book, Enquire Within upon Everything), which he wrote in 1980 as a contract programmer at the European Organization for Nuclear Research (CERN) in Geneva, Switzerland. In part, it’s because Berners-Lee did the unthinkable when, more than a decade later, he finished writing the tools that defined the Web’s basic structure: he gave them away, with CERN’s blessing, no strings attached. While others made millions off his invention, the soft-spoken programmer went on to found the World Wide Web Consortium (W3C) at MIT, which he still directs, to promote global Web standards and development.
Berners-Lee is finally getting his reward: in July he was knighted by Queen Elizabeth II, and the previous month he received Finland’s million-euro Millennium Technology Prize, awarded “for outstanding technological achievements that directly promote people’s quality of life, are based on humane values, and encourage sustainable economic development.”
Now in new offices in MIT’s Frank Gehry-designed Ray and Maria Stata Center, the 49-year-old native of England is busy overseeing hundreds of projects at the W3C. He is also personally engaged in developing his second big idea: the Semantic Web, which adds definition tags to information in Web pages and links them in such a way that computers can discover data more efficiently and form new associations between pieces of information, in effect creating a globally distributed database. Though part of Berners-Lee’s original intention for his invention, the Semantic Web has been 15 years in the making and has met its share of skepticism. But Berners-Lee believes it will soon win acceptance, enabling computers to extract meaning from far-flung information as easily as today’s Internet simply links individual documents.
The Semantic Web, coupled with other specifications and tools being developed at W3C, including accessibility standards for disabled people and software for mobile devices, is part of Berners-Lee’s grand vision of “a single Web of meaning, about everything and for everyone.” But is it a tangled web we weave? Despite his excitement about the future, Berners-Lee worries that poorly conceived changes to the Web’s organization and governance could compromise its inherent functionality and “universality.” The father of the World Wide Web shared his concerns – and dreams – the day before flying to Helsinki to accept his Millennium prize.
TECHNOLOGY REVIEW: For several years, you’ve been promoting something you call the Semantic Web, but people don’t seem too excited. Why not?
TIM BERNERS-LEE: It’s not the first time I’ve had this paradigm-shift problem. Early on, people really didn’t understand why the Web was interesting. They saw it in the smaller scale, and it’s not interesting in the smaller scale. Same thing with the Semantic Web.
TR: How do you get past that?
B-L: Right now we are just starting by putting applications onto the Semantic Web one by one and linking them up where it seems useful. But what’s exciting is the network effect. The vision is that we will get to a critical mass, where everything starts getting linked into an unimaginably large whole. Then, the incentive to add more to it rises exponentially as the value of what is out there also does.
Because few people initially get this great “aha!” of connecting to a huge mass of Semantic Web data, it all has to be done by people who are convinced – who understand that it’s worth putting the effort into getting the thing off the ground.
TR: Then please explain: Why is it worth all this up-front effort?
B-L: The common thread to the Semantic Web is that there’s lots of information out there – financial information, weather information, corporate information – on databases, spreadsheets, and websites that you can read but you can’t manipulate. The key thing is that this data exists, but the computers don’t know what it is and how it interrelates. You can’t write programs to use it.
But when there’s a web of interesting global semantic data, then you’ll be able to combine the data you know about with other data that you don’t know about. Our lives will be enriched by this data, which we didn’t have access to before, and we’ll be able to write programs that will actually help because they’ll be able to understand the data out there rather than just presenting it to us on the screen.
TR: How does the Semantic Web understand data?
B-L: Suppose you’re browsing the Web and you find a seminar advertised, and you decide to go. Now, there is all sorts of information on that page, which is accessible to you as a human being, but your computer doesn’t know what it means. So you must open a new calendar entry and paste the information in there. Then get your address book and add new entries for the people involved in the seminar. And then, if you wanted to be complete, find the latitude and the longitude of the seminar, and program that into your GPS [Global Positioning System] device so you could find it.
It’s very laborious to do all this by hand. What you would like to be able to do is just tell the computer, “I’m going to this seminar.” If there were a Semantic Web version of the page, it would have labeled information on it that would tell the computer “this is an event,” and what time and date it is. And it would automatically add your travel to your event book. It would add the people to your address book, and it would program your GPS to give you directions. It would have the relationships between the event and the various people chairing it. And those people would have Semantic Web personal pages, which contained information about how you could contact them.
Your address book can now grow from a closed repository of private data to a view on the people-related data in the world.
TR: Does the Semantic Web, then, merely automate many of the things that a human assistant would do?
B-L: No. A human assistant uses a form of intelligence that we are not mimicking here. The human assistant will have the human mind’s ability to suddenly think of correlates across the whole spectrum of his or her experience. “I’ve booked you through Tiawicha because they have the flower festival that weekend, I think, and…well, maybe you’ll like it” is a human thought process.
This is more like giving you a program which can do all the things which your MIS department could write programs to do but doesn’t have time to. But it is still a program. Just as the World Wide Web is still a document.
In the future, the Semantic Web will be a great place to develop artificial intelligence, AI, in the strong sense. But right now we are making something quite mechanical – even if we are using bits and pieces of the machinery developed by the AI community over the years.
TR: It would seem an impossibly huge task. How does the technology work?
B-L: The Semantic Web technology tackles the problem in two stages. The more mundane is a common data format. You can take a database or a calendar or an address book or a bank statement or a weather reading – basically anything with hard data in it – and make the machine write it in the basic Semantic Web language, instead of some proprietary or application-specific format. This solves the “syntactic” problem.
It still doesn’t solve the “semantic” one, though. For that, the Semantic Web first gives names to the basic concepts involved in the data: date and time, an event, a check, a transaction, temperature and pressure, and location. These are all defined just to mean whatever they mean in the system which produces the data – for example, “Transaction date as I get on a bank statement,” and so on. This set of concepts is called an ontology. Then, where there are connections between ontologies, such as when the date and time on a photograph is the same concept as the time on a weather report, we write rules to take advantage of these connections. This allows one to query the Semantic Web agent for photos taken on sunny days, for example. Bit by bit, link by link, the data becomes connected, interwoven. The exciting thing is serendipitous reuse of data: one person puts data up there for one thing, and another person uses it another way.
TR: You’ve said that “phase one” of the Semantic Web is finished. Can you explain?
B-L: The way the Semantic Web works is by defining new languages for computers to exchange information. Phase one was getting those first languages, for both syntax and semantics, to the state where they became standards supported by W3C’s members. Because interoperability is the key: you can’t call it a Semantic Web application if the program just sits there doing things with its own data format without being able to exchange data with other programs. Now there is this foundation, and anybody who wants to make a new application and publish data can do that, and everybody else’s program will be able to read the data.
TR: What kinds of Semantic Web applications are people making for the next phase?
B-L: Exciting things are happening in the life sciences. The big challenges such as cancer, AIDS, and drug discovery for new viruses require the interplay of vast amounts of data from many fields that overlap – genomics, proteomics, epidemiology, and so on. Some of this data is public, some very proprietary to drug companies, and some very private to a patient. The Semantic Web challenge of getting interoperability across these fields is great but has huge potential benefits.