Last week, the world’s biggest social network, Facebook, announced that it had reached 300 million users and is making enough money to cover its costs.
The challenge of dealing with such a huge number of users has been highlighted by hiccups suffered by some other social-networking sites. Twitter was beleaguered with scaling problems for some time and became infamous for its “Fail Whale”–the image that appears when the microblogging site’s services are unavailable.
In contrast, Facebook’s efforts to scale have gone remarkably smoothly. The site handles about a billion chat messages each day and, at peak times, serves about 1.2 million photos every second.
Facebook vice president of engineering Mike Schroepfer will appear on Wednesday at Technology Review’s EmTech@MIT conference in Cambridge, MA. He spoke with assistant editor Erica Naone about how the company has handled a constant flow of new users and new features.
Technology Review: What makes scaling a social network different from, say, scaling a news website?
Mike Schroepfer: Almost every view on the site is a logged-in, customized page view, and that’s not true for most sites. So what you see is very different than what I see, and is also different than what your sister sees. This is true not just on the home page, but on everything you look at throughout the site. Your view of the site is modified by who you are and who’s in your social graph, and it means we have to do a lot more computation to get these things done.
TR: What happens when I start taking actions on the site? It seems like that would make things even more complex.
MS: If you’re a friend of mine and you become a fan of the Green Day page, for example, that’s going to show up in my homepage, maybe in the highlights, maybe in the “stream.” If it shows me that, it’ll also say three of [my] other friends are fans. Just rendering that home page requires us to query this really rich interconnected dataset–we call it the graph–in real time and serve it up to the users in just a few seconds or hopefully under a second. We do that several billion times a day.
TR: How do you handle that? Most sites deal with having lots of users by caching–calculating a page once and storing it to show many times. It doesn’t seem like that would work for you.
MS: Your best weapon in most computer science problems is caching. But if, like the Facebook home page, it’s basically updating every minute or less than a minute, then pretty much every time I load it, it’s a new page, or at least has new content. That kind of throws the whole caching idea out the window. Doing things in or near real time puts a lot of pressure on the system because the live-ness or freshness of the data requires you to query more in real time.
We’ve built a couple systems behind that. One of them is a custom in-memory database that keeps track of what’s happening in your friends network and is able to return the core set of results very quickly, much more quickly than having to go and touch a database, for example. And then we have a lot of novel system architecture around how to shard and split out all of this data. There’s too much data updated too fast to stick it in a big central database. That doesn’t work. So we have to separate it out, split it out, to thousands of databases, and then be able to query those databases at high speed.