In 2001, Jesse Robbins applied for two jobs: one as a Seattle bus driver and another as a backup systems engineer at Amazon.com. Amazon called first, and in the decade that followed, Robbins transformed the way Web companies design and manage complex networks of servers and software.
A former volunteer firefighter, Robbins brought an emergency responder’s mind-set to his work. He taught Amazon that with data centers distributed around the world, a massive shopping site, and intricate fulfillment operations, some unpredictable and spectacular failures were inevitable. Rather than try to defy that inevitability, Robbins says, he made it safe for Amazon to fail, building fault tolerance into its architecture. Then he tested the Web operations teams with live drills, knocking entire data centers offline. Customers didn’t notice a thing.
After leaving Amazon in 2006, Robbins began blogging about his techniques. In 2007, he founded Velocity, now an annual conference, where fierce competitors such as Microsoft and Google share information about handling infrastructure problems.
In 2008, Robbins cofounded Opscode. Its main product, Chef, is an open-source programming language that automates management of cloud-based infrastructure. For example, one client used Chef to help scientists bring up and configure in 45 minutes a 10,000-processor supercomputing cluster on Amazon’s pay-as-you-go cloud, solve some difficult problems related to protein binding in eight hours, and then close out the operation, for a small fraction of what it would cost to build or buy time on a supercomputer. —Jessica Mintz