Less Clumsy Code for the Cloud
A new tool takes a hint from 1980s database programming.
Cloud computing allows companies to store and process data more efficiently than ever. But the code that’s used to control the machines in a computing “cloud” remains surprisingly clunky. Now some researchers are exploring novel programming languages for controlling the cloud, and they’re borrowing an approach developed in the ’80s.
Most programming languages were never designed handle so many computers or so much data spread out across them. Software frameworks such as Google’s MapReduce and an open competitor called Hadoop provide handy tools for doing this. But there’s room to make the process much more efficient.
“We can’t keep programming computers the way we are,” says Joseph Hellerstein, a professor of computer science at the University of California at Berkeley. Hellerstein is involved with a project called BOOM, aimed at developing new techniques for programming the cloud. “People don’t have an easy way to write programs that take advantage of the fact that they could rent 100 machines at Amazon.”
Most software programs are made up of instructions that tell a computer to take a series of actions in a certain order. One of the big advantages of cloud computing is that it makes it possible to split up a program so that different instructions can be processed at the same time. But it’s hard to write the code needed to do that with most programming languages, and this problem results in bloated software.
Hellerstein wants to make it possible to build software that runs on a much larger scale–across thousands of cloud-based machines–using far less code. To do this, he turned to research done in the ’80s on programming databases efficiently. Hellerstein says that database technologies, which can collect large sets of data and process them in a variety of ways, could be particularly successful at taking advantage of the new computing power.
One reason is that database information is often processed in batches, and it doesn’t matter which order a computer uses to handle these batches. This makes it easy for programmers to divide database tasks among a lot of processors. So easy, in fact, that programs of this nature are sometimes referred to as “embarrassingly parallel.”
Hellerstein’s group modified an old language called Datalog to enable it to write programs for the cloud. The problem, Hellerstein says, is figuring out how much of a program can happen simultaneously, and identifying times when it absolutely has to stop and gather information about the status of different tasks. The group is currently developing a language called Bloom to provide a “friendly way” for programmers to deal with the often complex syntax of the underlying system based on Datalog.
Hellerstein hopes that Bloom will help programmers write software for the cloud without having to step entirely away from familiar languages. Hellerstein’s group is designing Bloom as a library that can be used with popular languages such as Java and Python. Using Bloom within one of those languages would encourage programmers to design software that uses resources available in the cloud most efficiently. This also saves programmers from having to learn an entirely new language.
Georg Gottlob, a professor at the Oxford University Computing Laboratory and an expert on Datalog, says that using the language to handle Web-scale applications makes a lot of sense. When Datalog was invented, he says, “it came too early” to be used broadly due to the limited processing power at the time. With the proliferation of distributed computing, Gottlob says, the language has seen “a big Renaissance.”
“This is where the future is going,” says Elias Torres, who uses cloud computing tools at several startup companies. Several years ago, Torres was able to use a precursor of Bloom in a project to simplify a complex protocol for distributed systems. “It was enlightening, because I was able to focus on the data flowing through the system,” he says.
Nowadays, Torres says, “you need to understand how the data needs to be stored and organized and accessed in order to make progress.” Any Web application has to deal with an ever-growing magnitude of data, and Torres believes programmers will find tools like Bloom increasing useful.