Connectivity

Amazon’s $150 Million Typo Is a Lightning Rod for a Big Cloud Problem

A botched command inadvertently took down swaths of the Web, but it only serves to reveal that centralized Web services need to be built more robustly.

If you’re going to put all your data in the cloud, you want it to be a well-built cloud. This week, Amazon—the world’s largest provider of such infrastructure—showed that construction skills are still lacking.

On Tuesday, large parts of the Internet simply stopped working. Slack wouldn’t let people communicate with colleagues, Trello wouldn’t let you manage a project, and, sadly, the MIT Technology Review website wouldn’t let you read about emerging technology. There were also complaints about smart-home hardware failing to work properly.

The reason: Amazon’s S3 cloud storage system failed. Amazon is the world’s largest cloud computing provider, so many services that rely upon it were also unable to function properly. And this wasn’t just a blip: the problem took at least four hours to fix.

It’s hard to accurately quantify the true cost of such an outage. But, according to the Wall Street Journal, analytics firm Cyence has estimated that it cost S&P 500 companies at least $150 million. And the traffic monitoring firm Apica claims that 54 of the top 100 online retailers saw site performance slump by at least 20 percent. So there’s no way around the fact that it was expensive.

That makes the reason it happened all the more embarrassing. In a statement describing what went wrong, Amazon has admitted that the root cause of the outage was an incorrect command executed by a staff member at its Northern Virginia facility during routine maintenance. Sadly, it resulted in a catastrophic cascade of events.

The worker was supposed to take a small number of servers offline, but made a mistake and took more servers out than intended—including two that were used to power fundamental processes used across the entire system. The mistake essentially wiped out the facility’s ability to process user requests.

Amazon operates multiple cloud "areas" dotted around the world, and customers of its services are able to store files and run code on more than one of them. But it’s more expensive and, as the Register notes, even companies that do run their services across a number of the different geographies found their systems falling over, likely due to capacity issues.

Just four days before the outage, we described the inherent risks of centralized Web services and speculated about the impact that would be felt if Amazon’s cloud service failed. At the time, we warned that “the stakes are high,” arguing that “security, reliability, and competency” are vital—and perhaps underrepresented—for companies that provide centralized Web services.

Amazon appears to agree. It’s already put in place safeguards so that incidents like the one brought about by the ham-fisted staff member can’t shut down as many servers quite as quickly in the future.

That’s a start. But it’s clear at this point that cloud services need extra insurance policies if they’re to be robust. Amazon, for instance, shouldn’t have even been able to wind up in a situation where its entire Northern Virginia facility could fail at once—instead, it should be split up into separate sub-systems which work independently.

Even then, centralized Web services may still be vulnerable. If a hacker levels a huge attack at a single provider—using a botnet, for instance—he could still force large parts of the Web offline again. But at least it wouldn’t be the result of a typo.

(Read more: Wall Street Journal, the Register, AP, Amazon Web Services, “Centralized Web Services Are Wonderful—Until They Go Wrong,” “10 Breakthrough Technologies: Botnets of Things”)

The latest Insider Conversation is live! Listen to the story behind the story.

Subscribe today
Already a Premium subscriber? Log in.

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

More from Connectivity

What it means to be constantly connected with each other and vast sources of information.

Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus ad-free web experience, select discounts to partner offerings and MIT Technology Review events

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.