Some organizations are already caught in that flood. Consider Facebook. Already host to more digital photos than any other company, Facebook is building new storage and processing infrastructure as fast as it can. Yet it is pushing the database technology it is using to the limit, splitting its famed social graph across 4,000 databases that must all work together as one, Stonebraker says. “They are just dying under the load of the management layer needed to keep this system up,” he says. “They have the hardest database problem on the planet, and there’s no current system that will meet their needs.”
Solutions that Stonebraker is building for a very different sector already drowning in data may eventually help. A few years ago, he heard of the problems facing the Large Synoptic Survey Telescope under construction in Chile. “It is going to assemble 100 petabytes of raw data and derived data,” says Stonebraker, “and they had no clue what to do with that much.”
Stonebraker and collaborator David DeWitt, affiliated with University of Wisconsin-Madison, built a unique database system named SciDB. The open-source project now has venture backing and a large community of volunteers from within science. But Stonebraker thinks features of SciDB will eventually find favor beyond academia.
“All science data is uncertain and has error bars, unlike the data in a salary database, so SciDB can pay attention to uncertainty. It also cannot overwrite, because science guys never want to throw anything away,” he says. Those features are not so different from the need of the high powered, statistics-heavy analytics or “data science” increasingly at the heart of successful, technology-led businesses. One example is online ad placement: targeting every person individually requires computationally intense analysis to cluster similar people together.
However, Stonebraker doesn’t claim that new database systems like those he is working on can be a panacea for companies suddenly learning the limits of more established technologies. The growing importance of data storage and processing to business of all kinds will require them to make both more of a business priority. “If you’re running a company, you’ve got to engineer in scale from the beginning,” he says, “because there’s no doubt you will need it later.”