Operations inside a computer’s solid-state memory run faster than those that need to access the moving parts of a disk drive, sometimes by a factor of 100. As a result, database software gets an immediate performance increase.
But there are numerous other, less obvious, advantages as well. Steve Graves, CEO of McObject, which sells an in-memory database, says that traditional database software usually goes to elaborate lengths to minimize disk usage and improve performance. Those lines of code can simply be omitted in an in-memory version of the program, making performance even snappier.
Another advantage of in-memory systems is that they simplify a number of messy, time-consuming tasks that now must occur before data stored for accounting purposes can be used to analyze a company’s operations. This work usually lands in the lap of a company’s already overworked IT crew, creating a bottleneck for departments trying to put corporate data to good use. SAP cofounder Hasso Plattner is so enthusiastic about the quest for new in-memory data tools that he coauthored a book on the topic.
* The other significant new development in data involves traditional disk drives assembled in configurations of previously unthinkable size. Much of this work was first done at Google, in connection with its effort to index the entire Web. But public versions of many of those tools have since been developed, Linux-style, by the open-source community.
The best-known is Hadoop, a large-scale data storage approach being used by a growing number of businesses. The biggest known Hadoop cluster is a 25-petabyte system of 2,000 machines at Facebook. (A petabyte is a thousand terabytes, or 1 followed by 15 0s.) Google is believed to operate even larger clusters, but the company doesn’t discuss the matter.
These two architectural approaches to databases—in-memory and mega-scale—are being enhanced by changes in database software.
Traditional databases were designed to facilitate “transactions,” such as updating a bank account when an ATM withdrawal is made. They tend to be rigidly structured, with well-defined fields; the database for a payroll department would probably include fields for an employee’s name, Social Security number, tax filing status, and the like. The questions you can ask about the data are limited by the fields into which the data was entered in the first place. In contrast, the new approaches have a more forgiving way of handling “unstructured” data, like the contents of a Web page. As a result, users can ask questions that might not have even occurred to them when they were first setting up the database.
These two new data approaches have something else in common: they take advantage of new algorithmic insights. One example is “noSQL,” a new, less-is-more approach to querying a database; in the interest of speed, it dispenses with many of the less-used features of Structured Query Language, long a database standard. Another example is “columnar” databases, which are based on research showing that data software runs more quickly when information is, in effect, stored in columns instead of rows.
Obviously, enormous Web properties like Google or Facebook need new database approaches. But data technology vendors also trumpet success stories that have little or nothing to do with the Web.
For example, an in-memory system called HANA, made by SAP, enables the power-tool company Hilti to get customer information for its sales teams in a few seconds, rather than the several hours required by its traditional data warehousing software. That gives the sales staff virtually instant insights into a customer’s operations and order history.
Circle of Blue, a nonprofit that studies global water issues, is about to use in-memory products from QlikTech to aggregate masses of data for a complex study of the Great Lakes, says executive director J. Carl Ganter.
Financial-service providers are using Hadoop systems from a company called Cloudera in their fraud detection efforts, basically adding all the information they can get their hands on in an attempt to find new methods by which their payments networks might be abused.
A common theme is that the new systems make databases so fast and easy that companies invariably begin to find new uses for them. A big retailer, for example, might get into the habit of keeping a record of every click and mouse movement from every visitor to its site, with the goal of finding fresh insights about customer choices.
And as companies see the cost of storing data plummet, they are adjusting their notions of how much data they need to keep. Petabytes of storage aren’t just for Google anymore.
Lee Gomes, a writer in San Francisco, has covered Silicon Valley for 20 years, much of that time as a reporter for the Wall Street Journal.