Managing Users by the Million

Erica Naonearchive page

June 21, 2011

At the heart of social networks are their enormous repositories of data: personal details and messages, huge quantities of photos and videos, and the complex web of connections that reflect the actual social relationships among users. Successful networks are those that have mastered how to store, secure, and quickly access and analyze this data.

**Computing power:** Featuring energy-saving innovations, Facebook’s new data center in Prineville, Oregon, was custom built to handle the social network’s demands.

The numbers are staggering. By late last year, Twitter’s users were generating 12 terabytes a day, which adds up to four petabytes a year, or the equivalent of 83,000 Blu-ray video disks—and that’s assuming no further user growth.

The networks rely partly on hardware to deal with this flood of data, building large data centers stuffed with servers. The other part of the solution is the software: many companies are contributing to open-source code designed to handle big databases. Twitter uses an open-source database called Cassandra that’s designed to work at large scales, with processing tasks distributed across a variety of relatively cheap servers.

In addition to storing data, keeping up with users is a challenge, even when they do something as apparently simple as clicking a “Like” button. Every time a user reports having watched a television program, for example, there’s already been “a lot of calculation to support that,” explains Alex Iskold, founder and CEO of Adaptive Blue, the company that maintains the entertainment-oriented social network GetGlue (see “Turn On, Check In”). Each such note causes a cascade of calculations about what other media content to recommend, what information to display to other users, and whether any promotional incentives should be offered to the user. “The hardest thing to deal with is bursts,” Iskold says, referring to the surges in traffic during big events like the Oscars.

The information that social networks provide about users’ connections and interests has, despite initial doubts about its commercial value, proved incredibly lucrative. Social networks typically analyze the personal information provided by users to offer advertisers closely targeted commercial placement, a business that’s worth billions of dollars per year and growing. The value lies in “the combination of technology and identity,” says Jascha Kaykas-Wolff, senior vice president of marketing and customer success for Involver, a company that builds technology to help its customers create social-marketing campaigns. Data analysis tools are also being used to improve search results (see “Personalized Search”).

The value of these networks attracts scammers as well as advertisers. In 2011, the security company Sophos reported that 40 percent of those who use social-network sites have received malware, 43 percent have been subjected to phishing attacks, and 67 percent have received spam. In response, network operators have begun watching for patterns that indicate malicious activity (for example, a link being shared among users faster than a human could reasonably accomplish it) and trying to develop technology to block these attacks before they reach users. The social-gaming network Zynga tracks sites that host hacks, bots, and cheats and monitors users suspected of bad behavior. Facebook has also recently introduced the option of texting a pass code to a user’s phone when that person’s account is accessed from a new computer, in hopes of preventing unauthorized access to an account if a password is compromised.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.