Supercomputers churn through vast amounts of data using multiple processing cores working in parallel. But most supercomputers dip into a database to find information, process the data, and then produce a result–a process that can take minutes or days, depending on the task. In recent years, however, researchers have started to explore the potential of stream computing, a type of computing approach that lets them crunch a real-time stream of data in microseconds. Data from traffic cameras, accident reports, and weather could be used to predict traffic, and streaming audio could be transcribed or translated quicker.
Now IBM has shown that stream computing can be used to analyze market data faster than ever before. The result is a machine that helps automated trading systems determine the price of securities using financial events that have just occurred. To build the system, the computing company partnered with TD Securities, an investment-banking firm, to tweak IBM software called InfoSphere Streams for financial data. The firm ran the software on one of the latest IBM supercomputers, known as Blue Gene/P.
IBM’s system improves upon the current type of financial trading systems, which collect data from numerous different sources around the world, including constantly fluctuating prices of stocks and trading volumes. This information is broken into chunks, called messages, which are sent through trading systems. The more messages a system can examine, the more security prices it can determine, the more options can be sold using automated trading machines that match buyers with sellers.
The significant advance, says Nagui Halim, chief scientist of the stream-computing project at IBM, is that the engineers optimized the software to run on Blue Gene/P so that the data streams were analyzed faster than possible on other financial-analysis systems. The information arrived at a rate of five million messages per second, says Halim. The system could process a message within 200 microseconds. The result: a supercomputer that produces security prices 21 times faster than any other financial-trading system.
In some instances, says Halim, it’s critical to process the data as it comes in. A system that IBM has built monitors the vital signs of patients, such as their blood gas levels, and keeps track of patient statistics, such as their weight and medication regime. Data from these feeds, which can number in the hundreds, are analyzed and correlated, producing a picture of the patient’s health that would be impossible to draw from doctors’ or nurses’ observations alone.
IBM’s financial stream-computing system consists of three concepts, explains Halim. The first is the use of streams, data flows that move in one direction through the systems. The second is the fact that data is processed in chunks, or windows, within that stream. And the third is the use of a collection of algorithms that record the rate that the data comes in, that understand the capabilities of the hardware, and that direct the streams in the most efficient ways. These algorithms can take a stream and “spread it around in different ways,” Halim says, and “partition it on different kinds of hardware that are specialized to do certain tasks.”
For instance, some cores of a supercomputer might be optimized to process and summarize the text in news reports, such as the failing health of a company’s popular CEO, while others are better at performing simple mathematical operations on numbers that flow into the system. IBM has developed its own stream-computing language called Spade that can assess the capabilities on supercomputers and spread the data flows around appropriately, without needing much input from a programmer. Spade makes it possible, says Halim, for stream computing to run on other multiple-processing systems, not just Blue Gene/P.
Stream computing is not a new idea. In fact, concepts for processing data as it enters a computer were around in the 1960s, says Saman Amarasinghe, a professor of electrical engineering and computer science at MIT. But in recent years, it has become more practical to use, thanks to the growing popularity of multicore chips, which have multiple processing centers that crunch numbers independently. Streams of data can be broken up and partitioned to individual cores relatively easily, says Amarasinghe.
Amarasinghe adds that IBM has made improvements in the more academic, theoretical stream-computing work and has applied it to real-world problems. “IBM has brought stream computing to high performance,” he says. “They can make it run very fast.”
Amarasinghe suspects that the popularity of stream computing will grow due to a confluence of factors. First, the chip-making industry plans to keep increasing the number of cores that it builds on its chips. Second, stream computing is a relatively straightforward programming approach to making use of these multiple cores. Third, “there’s an explosion of data,” he says, “and it’s the type of data that streams in, like video and audio.” It could even lead to more advanced user interfaces for computers that can process real-time video and audio interactions from people, he says.