Cloud computing allows companies to store and process data more efficiently than ever. But the code that’s used to control the machines in a computing “cloud” remains surprisingly clunky. Now some researchers are exploring novel programming languages for controlling the cloud, and they’re borrowing an approach developed in the ’80s.
Most programming languages were never designed handle so many computers or so much data spread out across them. Software frameworks such as Google’s MapReduce and an open competitor called Hadoop provide handy tools for doing this. But there’s room to make the process much more efficient.
“We can’t keep programming computers the way we are,” says Joseph Hellerstein, a professor of computer science at the University of California at Berkeley. Hellerstein is involved with a project called BOOM, aimed at developing new techniques for programming the cloud. “People don’t have an easy way to write programs that take advantage of the fact that they could rent 100 machines at Amazon.”
Most software programs are made up of instructions that tell a computer to take a series of actions in a certain order. One of the big advantages of cloud computing is that it makes it possible to split up a program so that different instructions can be processed at the same time. But it’s hard to write the code needed to do that with most programming languages, and this problem results in bloated software.
Hellerstein wants to make it possible to build software that runs on a much larger scale–across thousands of cloud-based machines–using far less code. To do this, he turned to research done in the ’80s on programming databases efficiently. Hellerstein says that database technologies, which can collect large sets of data and process them in a variety of ways, could be particularly successful at taking advantage of the new computing power.
One reason is that database information is often processed in batches, and it doesn’t matter which order a computer uses to handle these batches. This makes it easy for programmers to divide database tasks among a lot of processors. So easy, in fact, that programs of this nature are sometimes referred to as “embarrassingly parallel.”