Computing

IBM Builds Biggest Data Drive Ever

The system could enable detailed simulations of real-world phenomena—or store 24 billion MP3s.

  • Thursday, August 25, 2011
  • By Tom Simonite

A data repository almost 10 times bigger than any made before is being built by researchers at IBM's Almaden, California, research lab. The 120 petabyte "drive"—that's 120 million gigabytes—is made up of 200,000 conventional hard disk drives working together. The giant data container is expected to store around one trillion files and should provide the space needed to allow more powerful simulations of complex systems, like those used to model weather and climate.

A 120 petabyte drive could hold 24 billion typical five-megabyte MP3 files or comfortably swallow 60 copies of the biggest backup of the Web, the 150 billion pages that make up the Internet Archive's WayBack Machine.

The data storage group at IBM Almaden is developing the record-breaking storage system for an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena. However, the new technologies developed to build such a large repository could enable similar systems for more conventional commercial computing, says Bruce Hillsberg, director of storage research at IBM and leader of the project.

"This 120 petabyte system is on the lunatic fringe now, but in a few years it may be that all cloud computing systems are like it," Hillsberg says. Just keeping track of the names, types, and other attributes of the files stored in the system will consume around two petabytes of its capacity.

Advertisement

Steve Conway, a vice president of research with the analyst firm IDC who specializes in high-performance computing (HPC), says IBM's repository is significantly bigger than previous storage systems. "A 120-petabye storage array would easily be the largest I've encountered," he says. The largest arrays available today are about 15 petabytes in size. Supercomputing problems that could benefit from more data storage include weather forecasts, seismic processing in the petroleum industry, and molecular studies of genomes or proteins, says Conway.

IBM's engineers developed a series of new hardware and software techniques to enable such a large hike in data-storage capacity. Finding a way to efficiently combine the thousands of hard drives that the system is built from was one challenge. As in most data centers, the drives sit in horizontal drawers stacked inside tall racks. Yet IBM's researchers had to make those significantly wider than usual to fit more disks into a smaller area. The disks must be cooled with circulating water rather than standard fans.

The inevitable failures that occur regularly in such a large collection of disks present another major challenge, says Hillsberg. IBM uses the standard tactic of storing multiple copies of data on different disks, but it employs new refinements that allow a supercomputer to keep working at almost full speed even when a drive breaks down.

Print

Related Articles

A Preview of Future Disk Drives

A prototype disk drive based on phase-change memory can outperform an off-the-shelf flash hard disk .

Hard-Drive Advance Wins the Nobel Prize

Findings transformed storage and could pave the way for new devices.

How to Kill a Hard Drive

A new portable system completely cleans magnetic disks.

powered by
Advertisement

MAGAZINE

Foundation Medicine: Personalizing Cancer Drugs

Foundation Medicine is offering a test that helps oncologists choose drugs targeted to the genetic profile of a patient's tumor cells. Has personalized cancer treatment finally arrived?

Sponsored Content

Technologies from National Instruments

Using Counters and Digital I/O
Use built-in counters and digital I/O on multifunction DAQ devices

> Click here for more National Instruments Videos <
Whitepaper

Temperature Measurements with Thermocouples: How-To Guide

This document is part of the “How-To Guide for Most Common Measurements” centralized resource portal. This tutorial provides a detailed guide for measurement and device considerations to take temperature measurements using thermocouples. Get an introduction to thermocouples, which are inexpensive sensing devices widely used with PC-based data acquisition systems. Also review some specific thermocouple examples and learn how thermocouples work and ways to integrate them into a data acquisition measurement system.

View full PDF > Listen to story >
Find us on Youtube

Videos

A Robot Recruit that Can Do It All

More

Advertisement

Technology Review Lists

TR50

Our list of the 50 most innovative companies, including the following:

Skybox Imaging

Organovo

PatientsLikeMe

Intel

More

Advertisement

Facebook

Advertisement