TR: When did it occur to you that you could use algorithms to optimize content delivery on the Web?
LEIGHTON: The first time I ever thought about the Internet was in 1995. My office [at MIT’s LCS] is down the hall from Tim Berners-Lee and the Web Consortium. Over time we talked about some of the issues facing the Internet. These are the kinds of large-scale networking problems that our group was working on and that I have a long-term interest in. So we took on some of them as research projects.
TR: In a sense, the Internet is really the ultimate networking challenge, isn’t it?
LEIGHTON: Yes. That’s right.
TR: What was the problem that you started with in ‘95?
LEIGHTON: We were looking at ways to deal with flash crowding and hot-spotting. That’s where a lot of people go to one site at one time and swamp the site and bring down the network around it-and make everyone unhappy.
TR: Can you explain the technologies you’ve developed?
LEIGHTON: Today we’re probably one of the world’s largest distributed networks. At a high level, we’re serving content or handling applications for end users, and we’re doing that from servers that are close to the end users. “Close” is something that changes dynamically, based on network conditions, server performance and load. Because we’re close, we can avoid a lot of the hangups, delays and packet loss that you might experience if you’re far away. Before, you typically got your interaction with a central Web site. And typically that was far away. Now you typically have a lot of your interactions-not all, but a lot-with an Akamai server that is near you and is selected in real time.
TR: What are the tricks and challenges to making this distributed system work?
LEIGHTON: It’s an extremely hard area; you can’t go and just throw a bunch of servers out there and have them all work with each other. The servers themselves are going to fail. Processors are going to fail. The Internet has all sorts of its own issues and failure modes. So all these kinds of things have to be built into the algorithmic approach. How do you develop a decentralized algorithm with imperfect information that is still going to work? That’s a huge challenge. But it’s clearly what you have to do. You can’t have any central point of failure or the system will come down. I can’t think of a component or a piece of hardware that hasn’t failed at some point or some place. So, it’s a given [that you need a distributed system].
When a client comes to one of our customers looking for content, we have to figure out where that client is, which of our locations at that moment is the best to serve the client from, and what load conditions are, so we don’t overload anything. We have got to handle flash crowds that are both geographic and content specific. We have got to replicate the content immediately to handle any of those kinds of issues, but you can’t afford to have copies of everything everywhere. You’ve got to make these decisions and respond back to the clients in milliseconds. We’ve got to be automatic. And when pieces fail, you’ve got to compensate automatically for that.