Last week, Intel, Yahoo, HP, and an international trio of research institutions announced a joint cloud-computing research initiative. The ambitious six-site project is aimed at developing an Internet-based computer infrastructure stable enough to host companies’ most critical data-processing tasks. The project also holds an unusual promise for advances in fields as diverse as climate change modeling and molecular biology.
The new array of six linked data centers, one operated by each project sponsor, will be one of the largest experiments to date focusing on cloud computing–an umbrella term for moving complex computing tasks, such as data processing and storage, into a network-connected “cloud” of external data centers, which might perform similar tasks for multiple customers.
The project’s large scope will allow researchers to test and develop security, networking, and infrastructure components on a large scale simulating an open Internet environment. But to test this infrastructure, academic researchers will also run real-world, data-intensive projects that, in their own right, could yield advances in fields as varied as data mining, context-sensitive Web search, and communication in virtual-reality environments.
“Making this marriage of substantial processing power, computing resources, and data resources work efficiently, seamlessly, and transparently is the challenge,” says Michael Heath, interim head of the computer-science department at the University of Illinois at Urbana-Champaign, an institute that is part of the alliance. Heath says that for the project to be successful, the team, which also includes Germany’s Karlsruhe Institute of Technology and Singapore’s Infocomm Development Authority, needs “to be running realistic applications.”
Much of the technology industry has recently focused on cloud computing as a next critical architectural advance, but even backers say that the model remains technologically immature.
Web-based software and the ability to “rent” processing power or data storage from outside companies are already common. The most ambitious visions of cloud computing expand on this, predicting that companies will ultimately use remotely hosted cloud services to perform even their most complex computing activities. However, creating an online environment where these complicated tasks are secure, fast, reliable, and simple still presents considerable challenges.
Virtually every big technology company, including Google, IBM, Microsoft, and AT&T, already has a cloud-computing initiative. Farthest along commercially may be Amazon, whose Web Services division already hosts computing, storage, databases, and other resources for some customers.
The new cloud-computing project will consist of six computing clusters, one housed with each founding member of the partnership, with each containing between 1,000 and 4,000 processors. Each of the companies involved has a specific set of research projects planned, with many broadly focusing on operational issues such as security, load balancing, managing parallel processes on a very large scale, and how to configure and secure virtual machines across different locations.
Researchers will be given unusually broad latitude to modify the project’s architecture from top to bottom, developing and experimenting with ideas applying to hardware, software, networking functions, and applications. Project managers say that one goal is to see how changes at one technical level affect others.
“In the cloud, we have the opportunity for integrated design, where one entity can make design choices across an entire environment,” says Russ Daniels, chief technology officer of HP’s cloud-services division. “This way, we can understand the impact of design choices that we make at the infrastructure level, as well as the impact they have on higher-level systems.”
HP, for example, will be focusing in part on an ongoing project called Cells as a Service, an effort to create secure virtual “containers” that are composed of virtual machines, virtual storage volumes, and virtual networks. The containers can be split between separate data centers but still treated by consumers as a traditional, real-world collection of hardware.
Among Yahoo’s specific projects will be the development of Hadoop, an open-source software platform for creating large-scale data-processing and data-querying applications. Yahoo has already built one big cloud-computing facility called M45 that is operated in conjunction with Carnegie Mellon University. M45 will also be folded into this new project.
Running in parallel with this systems-level research will be the assortment of other research projects designed to test the cloud infrastructure.
Computer scientists at the Illinois facility have a handful of data- and processing-intensive projects under way that are likely to be ported to the cloud facilities. According to Heath, one key thrust will be “deep search” and information extraction, such as allowing a computer to understand the real-world context of the contents found in a Web page. For example, today’s search engines have difficulty understanding that a phone number is in fact an active phone number, rather than just a series of digits. A project run by Urbana-Champaign professor Kevin Chang is exploring the idea of using the massive quantities of data collected by Web-wide search engines as a kind of cross-reference tool, so that the joint appearance of “555-1212” with “John Borland” multiple times online might identify the number as a phone number and associate it with that particular name.
Heath says that other projects might include experiments with tele-immersive communication–virtual-reality-type environments that let computers provide physical, or haptic, feedback to users as they communicate or engage in real-world activities controlled remotely over the Web.
In an e-mail, Intel Research vice president Andrew Chein said that other topics could include climate modeling, molecular biology, industrial design, and digital library research.
“By looking at what people are really doing, we will learn about what is really important from an infrastructure perspective,” says Raghu Ramakrishnan, chief scientist for Yahoo’s Cloud Computing and Data Infrastructure Group. “We already know enough to put forth systems that are usable today, but not enough that we can deliver on all the promise that people see in the paradigm.”