Skip to Content
Silicon Valley

How Facebook Works

The social network’s technology manages a vast and rapidly expanding web of connections for its millions of users.

Facebook is a wonderful example of the network effect, in which the value of a network to a user is exponentially proportional to the number of other users that network has.

Facebook’s power derives from what Jeff Rothschild, its vice president of technology, calls the “social graph”–the sum of the wildly various connections between the site’s users and their friends; between people and events; between events and photos; between photos and people; and between a huge number of discrete objects linked by metadata describing them and their connections.

Facebook maintains data centers in Santa Clara, CA; San Francisco; and Northern Virginia. The centers are built on the backs of three tiers of x86 servers loaded up with open-source software, some that Facebook has created itself.

Let’s look at the main facility, in Santa Clara, and then show how it interacts with its siblings.

The top tier of the Facebook network is made up of the Web servers that create the Web pages that users see, most with eight cores running 64-bit Linux and Apache. Many of the social network’s pages and features are created using PHP, a computer scripting language specialized for simple, automated functions. But ­Facebook also develops complex core applications using a variety of full-featured computer languages, including C++, Java, Python, and Ruby. To manage the complexity of this approach, the company created Thrift, an application framework that lets programs compiled from different languages work together.

The bottom tier consists of eight-core Linux servers running MySQL, an open-source database server application. Rothschild estimates that Facebook has about 800 such servers distributing about 40 terabytes of user data. This tier stores all the metadata about every object in the database, such as a person, photo, or event.

The middle tier consists of caching servers. Even 800 database servers can’t serve up all the needed data: Facebook receives 15 million requests per second for both data and connections. Bulked-up cache servers, running Linux and the open-source Memcache software, fill the gap. About 95 percent of data queries can be filled from the cache servers’ 15 terabytes of RAM, so that only 500,000 queries per second have to be passed to the MySQL databases and their relatively slow hard drives.

Photos, videos, and other objects that populate the Web tier are stored in separate ­filers within the data center.

The San Francisco ­facility replicates the Web and cache tiers, as well as the filers with the database objects, but it uses the Santa Clara MySQL database tier.

The Virginia data center is too far away to share MySQL databases: with 70 milliseconds of Internet delay, give or take, it just won’t work. Thus, it completely duplicates the Santa Clara ­facility, using MySQL replication to keep the database tiers in sync.

What’s next for Facebook’s technology? For one thing, says ­Rothschild, the company has discovered that interrupts on the servers’ Ethernet controllers–which let the servers process myriad requests arriving at the same time–are a bottleneck, since they’re generally handled by only one core. So Facebook rewrote the controllers’ drivers to scale on multicore systems. Facebook is also experimenting with solid-state drives, which could speed the performance of the MySQL database tier by a factor of 100.

Given that Facebook is growing–and that connections grow exponentially–the site is going to need that performance soon.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.