Today’s data, however, doesn’t take the familiar form of the database. “The information’s not coming at you in a clean tabular form,” Apte says. “It’s coming at you in a network form.” Often it arrives in a graph, he explains—such as those used by social media. These graphs often record not only the complex connections between nodes but also other types of information in a diversity of formats, such as the videos, images, and comments that people post on social networks.
Social media may have started the trend toward analyzing such graphs, Apte says, but network data comes from other sources as well—for example, from complex engineering systems such as the electric power grid, water distribution systems, and traffic management systems. The distributed sensor networks in these systems produce data sets in which the connections between locations are as important as friendships between individuals in a social network. Understanding such connections is the key to optimizing systems and making them sustainable, Apte says.
People have been working with graphs of data for hundreds of years, but the graphs now being plotted from social networks or sensor networks are of an unprecedented scale, Apte says. “These are gigantic graphs,” he says. “You’re talking about millions of nodes and tens of millions of links.”
Dealing with graphs of that size and scope, and applying modern analytic tools to them, calls for better algorithms and other innovations. Apte says one goal of the conference is to bring cutting-edge techniques from academia and industry research labs to the attention of businesses, so they can apply them more quickly. At the same time, the conference organizers hope, academics will get a sense of the business challenges that most vitally need to be addressed.
Fayyad says that the intense business interest in data has changed the field of data mining. Scientists, he says, mainly dealt with data stored in neat, structured forms. But most of the data that businesses are producing is an unstructured mess.
“While the scientists were getting pretty good at avoiding that stuff, the businesses were being forced to take it head-on,” Fayyad says. “It drove the companies to start developing techniques that no one had ever attempted.”
Certainly, challenges remain, but, Fayyad says, “people are able to come up with a lot more predictive models, and more importantly score them [to determine how well they work] … It takes analysis to a level that’s truly beyond human brain comprehension.”