Swaths of the Web disappeared yesterday, revealing just how heavily many of its users have unwittingly come to rely on Amazon for more than just their shopping. The retailer also rents out servers to other companies building websites, and one of the huge warehouses full of those computers began to experience problems early on Thursday with widespread effects.
Amazon’s cloud crashed…taking with it Reddit, Quora, FourSquare, Hootsuite, parts of the New York Times, ProPublica and about 70 other sites.
Zynga, provider of Facebook games such as FarmVille, was also affected, as was the political site Talking Points Memo. They all rely on an Amazon service known as EC2, or elastic cloud compute. Some sites remained down all day, with a handful of them, such as leading news aggregator Reddit and in vogue question-and-answer site Quora, still with only limited functionality at the time of this posting.
A group of hackers known as Anonymous attacked Amazon in December and couldn’t make any mark on the company’s famously robust servers. The outage wasn’t because they had found a chink in Amazon’s armor,
“A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1,” said Amazon’s status page for its Web Services.
The lay translation is that chunks of storage on the servers at a data center near Washington DC (aka US-EAST-1) started making copies of the data they held. That used up the available storage and brought everything to a halt. The data center is partitioned into isolated clusters of computers that are supposed to ensure they don’t all fail at once, but the problem cascaded across those boundaries.
Some companies that relied on that data center survived or recovered faster because they also used other Amazon locations, or providers. Amazon itself wasn’t affected because, like other Web giants such as Google, one luxury of building your own data centers is that you can carefully play them off against one another to assure high reliability. That was one motivation for Facebook’s monster new data warehouse unveiled last week.
Amazon will have to refund many customers for exceeding the four hours of downtime its customer’s agreement allows per year. Yet despite headlines about the dangers of cloud computing, the company is unlikely to lose much business. A near-unique offering of pay-as-you-go computing and storage power that can handle traffic spikes, malicious attacks and more is irresistible to engineers that just want to build new web services and features fast. A glance at the most comprehensive list of affected sites neatly shows how startups in particular love this approach. As the holding page put in place by Quora put it,
We’d point fingers but we wouldn’t be where we are today without EC2.
As the current generation of Web 2.0+ companies raised on this approach matures, Amazon’s cloud is likely to become more critical, not less. Hopefully those using it will now be more aware of its vulnerabilities and plan accordingly.