Basking in Big Data

Visualization software makes viewing and interacting with enormous data sets practical without a supercomputer.

Kate Greenearchive page

January 16, 2009

In some ways, science is suffering from too much data. Experiments and computer simulations analyzing everything from the dynamics of climate change to the precise details of folding proteins can churn out billions of numbers describing these physical phenomena. Making sense of all this data remains a challenge.

**Data extraction:** This image shows an experiment in which aerogel, a porous material, is bombarded by a micrometeroid traveling at five kilometers per second. Aerogels are commonly used to shield electronic equipment in satellites because they are both durable and extremely light. The Morse-Smale complex identifies the structure of the porous solid as the micrometeroid enters it, providing detailed information about the filament structure of the material (shown at right).

Recently, however, researchers at the University of California, Davis, and Lawrence Livermore National Laboratory announced that they have developed software that makes analysis and visualization of huge data sets possible without the aid of a supercomputer. The researchers’ algorithm slices up data into more manageable chunks, then stitches it back together on the fly, so that the data can be manipulated in three dimensions, all on a computer with the power and capacity of a high-end laptop.

The team’s algorithm offers a practical way to get structural information about materials, proteins, and fluids, says Attila Gyulassy, the researcher at UC Davis who led the project. It allows users to “interactively visualize, rotate, apply different transfer functions, and highlight different aspects of the data,” he says.

See the photo gallery here.

The software employs a mathematical tool called the Morse-Smale complex, which has been used for around 4 years to extract and visualize elements of large data sets by sorting them into segments that contain mathematically similar features. But while the Morse-Smale complex has been known for decades, it normally requires huge amounts of memory to perform the necessary calculations on a computer.

Gyulassy and his colleagues found a solution to this memory problem by writing an algorithm that breaks apart a data set before using the Morse-Smale complex, then stitches the blocks back together again. This means that only a small amount of data is needed at each step, so much less has to be stored in memory. As a result, the software can run on a desktop computer with just two gigabytes of memory.

Memory is one of the big limiting factors when trying to perform complex analysis of large data sets, says Peter Schröder, a professor of computer science at California Institute of Technology, in Pasadena. “You can’t even fit the stuff in memory,” he says. “But [the researchers] have addressed it.”

Schröder adds that, while the new software isn’t the only data-visualization tool available, it looks particularly powerful and practical for a number of scientific applications. Algorithms such as this are changing science, he adds: “Things that used to be considered too abstract or too crazy to use for data analysis are turning not just into algorithms, but practical algorithms.”

Gyulassy says that his team has plans to release an open-source software library by the end of March so that other researchers can take advantage of the approach, and modify it to suit their needs.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.