If you’re going to put all your data in the cloud, you want it to be a well-built cloud. This week, Amazon—the world’s largest provider of such infrastructure—showed that construction skills are still lacking.
On Tuesday, large parts of the Internet simply stopped working. Slack wouldn’t let people communicate with colleagues, Trello wouldn’t let you manage a project, and, sadly, the MIT Technology Review website wouldn’t let you read about emerging technology. There were also complaints about smart-home hardware failing to work properly.
The reason: Amazon’s S3 cloud storage system failed. Amazon is the world’s largest cloud computing provider, so many services that rely upon it were also unable to function properly. And this wasn’t just a blip: the problem took at least four hours to fix.
It’s hard to accurately quantify the true cost of such an outage. But, according to the Wall Street Journal, analytics firm Cyence has estimated that it cost S&P 500 companies at least $150 million. And the traffic monitoring firm Apica claims that 54 of the top 100 online retailers saw site performance slump by at least 20 percent. So there’s no way around the fact that it was expensive.
That makes the reason it happened all the more embarrassing. In a statement describing what went wrong, Amazon has admitted that the root cause of the outage was an incorrect command executed by a staff member at its Northern Virginia facility during routine maintenance. Sadly, it resulted in a catastrophic cascade of events.
The worker was supposed to take a small number of servers offline, but made a mistake and took more servers out than intended—including two that were used to power fundamental processes used across the entire system. The mistake essentially wiped out the facility’s ability to process user requests.
Amazon operates multiple cloud "areas" dotted around the world, and customers of its services are able to store files and run code on more than one of them. But it’s more expensive and, as the Register notes, even companies that do run their services across a number of the different geographies found their systems falling over, likely due to capacity issues.
Just four days before the outage, we described the inherent risks of centralized Web services and speculated about the impact that would be felt if Amazon’s cloud service failed. At the time, we warned that “the stakes are high,” arguing that “security, reliability, and competency” are vital—and perhaps underrepresented—for companies that provide centralized Web services.
Amazon appears to agree. It’s already put in place safeguards so that incidents like the one brought about by the ham-fisted staff member can’t shut down as many servers quite as quickly in the future.
That’s a start. But it’s clear at this point that cloud services need extra insurance policies if they’re to be robust. Amazon, for instance, shouldn’t have even been able to wind up in a situation where its entire Northern Virginia facility could fail at once—instead, it should be split up into separate sub-systems which work independently.
Even then, centralized Web services may still be vulnerable. If a hacker levels a huge attack at a single provider—using a botnet, for instance—he could still force large parts of the Web offline again. But at least it wouldn’t be the result of a typo.
These weird virtual creatures evolve their bodies to solve problems
They show how intelligence and body plans are closely linked—and could unlock AI for robots.
A horrifying new AI app swaps women into porn videos with a click
Deepfake researchers have long feared the day this would arrive.
Chinese hackers disguised themselves as Iran to target Israel
But they left a few clues that gave them away.
DeepMind says it will release the structure of every protein known to science
The company has already used its protein-folding AI, AlphaFold, to generate structures for the human proteome, as well as yeast, fruit flies, mice, and more.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.