Twitter’s Growing Pains

Plagued by service outages, the microblogging site rebuilds its infrastructure.

John Borlandarchive page

July 21, 2008

As every Twitter addict knows, the popular messaging system has suffered long and frustrating service outages, measured in hours or even tens of hours per month.

User complaints and defections to rival services have risen, and in recent months, some developers have even said that they’ve suspended work on their Twitter-related projects. But as these problems have crested, Twitter has been increasingly open about what executives say is a large-scale, ground-up architectural revamp aimed at improving the service’s stability.

The company has avoided detailing timing or technical specifics. But cofounder Biz Stone said in an e-mail interview last week that the changes are under way, and that users should already be benefiting from the results.

“We are improving the system organically,” Stone says. “We are already seeing correspondingly gradual improvements in stability as a result of our piece-by-piece improvements.”

Twitter’s service allows users to post 140-character messages online, and it lets other users receive these messages on cell phones or computers. According to the company’s engineers, the service was originally built using “technologies and practices” suited to a content management system. Such systems generally use databases to organize content for publication, whether online or in print. The content management system used to produce this page, for example, has separate database entries for the story’s title, the author’s name, the date of publication, and the like. Although such systems allow users to create and revise content, they are not designed for real-time exchange of data.

But Twitter evolved quickly into a genuine communications network, with its own unique, fast-paced style of conversation and group messaging. The various hacks that Twitter’s engineers used to turn a content management system into a messaging service haven’t prevented persistent collapse.

The system’s faults have been much dissected online. Twitter was originally written using the Ruby on Rails development framework, which provides considerable programming power and flexibility but is slow in interactions with a back-end database during heavy use. Nor was the system’s original MySQL database structure well suited to handling the complex, fast-paced network of queries spawned by users “following” the updates of thousands of others.

The Twitter team’s responses have been akin to triage at times: turning off features such as the ability to use the service through instant-messaging programs, or temporarily shutting down the “Reply” function, an extremely popular means of facilitating conversations between users.

The company has also periodically, and unpredictably, changed the frequency with which external applications can request Twitter data by using the company’s application programming interface (API), a particular thorn in the side of developers who use this conduit for their own applications.

But after a particularly painful May and June, in which downtimes were unusually frequent and protracted, the company may be turning a corner.

With a new venture funding round just completed, a set of new engineering hires, and a days-old acquisition aimed at beefing up Twitter’s search and filtering capabilities, the company is bolstering its community’s flagging confidence. “Right now, we’re looking at restarting development,” says Brian Breslin, chief executive officer of Infinimedia, a company that several months ago halted work on its Twitter-related Firefox extension, Twitbin. “I think they’re taking the right steps, and not rushing into anything.”

Outages and short-term fixes continue, as was illustrated by the several hours of downtime late last week, attributed to an “unplanned maintenance event.” But over the long term, the company plans to “replace our existing system, component by component, with parts that are designed from the ground up,” according to a recent blog posting by engineer Alex Payne.

Elsewhere, the company’s founders and developers have written that it will reduce its reliance on Ruby on Rails, and will move to a “simple, elegant file-system-based approach,” to replace its original unwieldy database system.

“We are not starting from scratch,” Stone says. “However, we are experimenting with different approaches here, and we’ve already moved a lot away from the database.”

The company’s acquisition this week of search company Summize will also add a range of new capabilities focused on searching and filtering Twitter users’ posts, or “tweets.” The deal brings five new engineers in house who will “be invaluable in our continued efforts to improve system reliability and performance,” Stone says.

The big question is whether these improvements will be able to keep up with users’ still-evolving demands on the system.

Eran Hammer-Lahav, now an open-standards evangelist at Yahoo, worked on similar problems while building a (now defunct) microblogging service called Nouncer. He says that Twitter is in an unenviable position for a startup, having to build a communications-class technical infrastructure that supports unpredictable activity.

“The product is not completely driven by them. It’s driven by how people use it,” Hammer-Lahav says. “The question isn’t how do you scale a microblogging service in theory. It’s how do you scale microblogging to handle the way users are using Twitter right now.”

Stone says that this time the company is trying to build in flexibility for the unforeseen. “We can stay flexible by identifying the small pieces that add up to the whole,” he says. “In so doing, Twitter can respond to new challenges–I’m picturing a flock of birds moving as one around objects, or instantly changing direction in flight.”

Indeed, Twitter’s problems don’t seem to have arrested its momentum. According to Twitter-tracking service TwitDir.com, the company now has more than two million users. Visits to Twitter’s website have climbed steeply in recent months, despite outages, according to Web monitoring company Hitwise.

“People are so intertwined with Twitter now, they’ll accept the problems,” Breslin says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.