In a Data Deluge, Companies Seek to Fill a New Role
A job invented in Silicon Valley is going mainstream as more industries try to gain an edge from big data.
Communications, computing, and biomedical advances are spurring an explosion of data. How companies use it could determine their own survival.
The job description “data scientist” didn’t exist five years ago. No one advertised for an expert in data science, and you couldn’t go to school to specialize in the field. Today, companies are fighting to recruit these specialists, courses on how to become one are popping up at many universities, and the Harvard Business Review even proclaimed that data scientist is the “sexiest” job of the 21st century.
Data scientists take huge amounts of data and attempt to pull useful information out. The job combines statistics and programming to identify sometimes subtle factors that can have a big impact on a company’s bottom line, from whether a person will click on a certain type of ad to whether a new chemical will be toxic in the human body.
While Wall Street, Madison Avenue, and Detroit have always employed data jockeys to make sense of business statistics, the rise of this specialty reflects the massive expansion in the scope and variety of data now available in some industries, like those that collect data about customers on the Web. There’s more data than individual managers can wrap their minds around—too much of it, changing too fast, to be analyzed with traditional approaches.
As smartphones promise to become a new source of valuable data to retailers, for example, Walmart is competing to bring more data scientists on board and now advertises for dozens of open positions, including “Big Fast Data Engineer.” Sensors in factories and on industrial equipment are also delivering mountains of new data, leading General Electric to hire data scientists to analyze these feeds.
The term “data science” was coined in Silicon Valley in 2008 by two data analysts then working at LinkedIn and Facebook (see “What Facebook Knows”). Now many startups are basing their businesses on their ability to analyze large quantities of data—often from disparate sources. ZestFinance, for example, has a predictive model that uses hundreds of variables to determine whether a lender should offer high-risk credit. The underwriting risk it achieves is 40 percent lower than that borne by traditional lenders, says ZestFinance data scientist John Candido. “All data is credit data to us,” he says.
Data scientist has become a popular job title partly because it has helped pull together a growing number of haphazardly defined and overlapping job roles, says Jake Klamka, who runs a six-week fellowship to place PhDs from fields like math, astrophysics, and even neuroscience in such jobs. “We have anyone who works with a lot of data in their research,” Klamka says. “They need to know how to program, but they also have to have strong communications skills and curiosity.”
The best data scientists are defined as much by their creativity as by their code-writing prowess. The company Kaggle organizes contests where data scientists compete to find the best way to make sense of massive data sets (see “Startup Turns Data Crunching into a High-Stakes Sport”). Many of the top Kagglers (there are 88,000 registered on the site) come from fields like astrophysics or electrical engineering, says CEO Anthony Goldbloom. The top-ranked participant is an actuary in Singapore.
Universities are starting to respond to the job market’s needs. Stanford University plans to launch a data science master’s track in its statistics department, says department chair Guenther Walther. A dozen or so other programs have already been started at schools including Columbia University and the University of California, San Francisco. Cloudera, a company that sells software to process and organize large volumes of data, announced in April that it would work with seven universities to offer undergraduates professional training on how to work with “big data” technologies.
Cloudera’s education program director, Mark Morissey, says a skills shortage is looming and that “the market is not going to grow at the rate it currently wants to.” That has driven salaries up. In Silicon Valley, salaries for entry-level data scientists are around $110,000 to $120,000.
Others think the trend could create a new area of outsourcing. Shashi Godbole, a data scientist in Mumbai, India, who is ranked 20th on Kaggle’s scoreboard, recently completed a Kaggle-arranged hourly consulting gig, a new business the platform is getting into. He did work for a tiny health advocacy nonprofit located in Chicago and is now bidding on more jobs (he earns $200 per hour, and Kaggle collects $300 an hour). His Kaggle work is part time for now, but he says it’s possible that it could be his major source of income one day.
To the data scientists themselves, the job is certainly less sexy than it’s being made out to be. Josh Wills, a senior director of data science at Cloudera, says most of the time it involves cleaning up messy data—for example, by putting it in the right columns and sorting it.
“I’m a data janitor. That’s the sexiest job of the 21st century,” he says. “It’s very flattering, but it’s also a little baffling.”
Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.Subscribe today