Facebook’s Engineering Challenges Show How Fast It’s Growing

More than half the data the site currently stores was added in the last year.

Erica Naonearchive page

June 23, 2010

Numbers on Facebook’s exponential growth often get thrown around, but it can be hard to comprehend what those numbers mean. The site is becoming the repository for huge quantities of information about the daily lives of its users, and it can be easier to understand how significant this is when listening to company executives discuss how Facebook struggles to manage all of this information. Bobby Johnson, director of engineering at Facebook, spoke this morning at the Usenix WebApps ‘10 conference in Boston, where he outlined how the company handles the technical and organizational problems created by its rapid rise.

The site’s 400 million users have an average of 130 friends each, and just this social graph data is tens of terabytes in size. What’s more, this data has to stay accessible at all times, since it’s used to respond to almost any action that a user takes. For example, when a user logs in, data about that person’s social connections is used to figure out what information to display in the user’s news feed (the first screen shown after login).

On top of keeping track of users’ connections to each other, Facebook has increasingly become the archive for users’ personal memories. The site has long been the largest photo-sharing site on the Web, and virtual photo albums have in many cases replaced the paper albums that used to sit on people’s shelves.

But while the accumulation of photos and videos may become an issue for the site in the long run, Johnson says that for now the main issue is dealing with new data. More than half of the data currently on the site was added this year, he says. Facebook plans never to delete old data, but even if they did, Johnson notes that it would do little to relieve the challenge of storing the flood of new data.

The company obviously takes the responsibility of storing all this data seriously–it routinely replicates information at least three times to ensure it is safe from hardware failure and bugs. It’s stunning, however, to contemplate how large a responsibility the company has for information belonging to a growing number of people around the world.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.