Assembling the Digital Sky
U.S. astronomers are gathering terabytes of data into a worldwide “virtual observatory” that will be accessible to scientists and laymen alike.
Scientists in the United States, armed with a $10 million grant from the National Science Foundation, are building a National Virtual Observatory (NVO) that will make the world’s huge store of astronomical data available to anyone with a Web browser.
“History has shown us that the greatest leaps forward have occurred not when you observe the universe through just one window, but when you compare the views of the universe obtained through different windows,” says Ray Norris, deputy director of the Australia Telescope National Facility in Epping, New South Wales, Australia. “The NVO will enable any astronomer to do this easily, combining all available data on one object or one region of the sky, or perhaps even using data-mining techniques to look for subtle correlations between the properties of a class of objects when viewed through different windows.”
The hope is to dramatically advance this computational approach to astronomy. “I can imagine entire research projects being done from NVO data,” says Bob Hanisch, the NVO project manager and an astronomer at the Space Telescope Science Institute in Baltimore.
The inspiration for the NVO is the Sloan Digital Sky Survey, an electronic catalog of images in multiple wavelengths spanning half the northern sky-100 million celestial objects in all, encoded in four databases and viewable from a Web portal. The NVO will take the Sloan survey and combine it with other, smaller U.S. and international surveys, including some maintained by the United Kingdom, Australia, India, and European Union.
As the virtual in NVO suggests, the project is more about computing than the optical telescope images and gamma ray, infrared, radio, ultraviolet, and X-ray snapshots of the heavens collected in the surveys. The main hardware platform will be the emerging “grids” that federate research centers’ supercomputers, servers, and high-speed networks into single, powerful computing resources. The NVO will both depend on grid computing and demonstrate its usefulness, astronomy being an uncommonly good test case, say NVO advocates, because of its large yet manageable universe of free, publicly available data.
Building out grids is more the task of participating grid-computing hotbeds such as the San Diego Supercomputer Center (SDSC). For their part, NVO architects will instead tackle other challenges on the bleeding edge of computing, most of which involve managing large distributed databases. The trick is to make a collection of fundamentally different databases (some in Oracle, others in SQL Server, for example) work uniformly with the software that displays and analyzes the information. The databases themselves will usually remain in separate locations to avoiding clogging network bandwidth, but performance will still be an issue, especially when researchers want to run complex queries. In response, Hanisch says, NVO data centers plan to offer additional services that take over such jobs from remote PCs.
Other database-intensive disciplines, such as bioinformatics, astrophysics, and the earth sciences, stand to gain from potential advances in grid computing and database technology. Bioinformatics is eyeing the NVO for new approaches to storing and exchanging multi-gigabyte maps of the human genome. Earth scientists are also involved in the NVO research effort because, Hanisch says, like astronomers, they work by comparing data from different instruments.
The initial NSF-funded work focuses on data interoperability, a key component of which is VOTable 1.0, a data-exchange standard released on April 15 that uses the Extensible Markup Language (XML) to represent large datasets. “We are putting VOTable into practical, everyday use now,” Hanisch says. Next on tap: the Simple Image Access Prototype specification, an image-handling complement to VOTable now under discussion with international partners. In addition, Hanisch expects within a year or two to see Web services directories that will make it easier to deliver and search through newly published data.
Metadata (data about data) and Semantic Web technology are two other elements the NVO team has deemed essential in its ambitious effort to federate the data of an entire scientific discipline. “The rate at which services are being defined is limited by how fast the community can reach consensus on difficult semantic and knowledge-management issues,” says Reagan Moore, an associate director at SDSC. “Given the need for a consensus across multiple groups, the services that are being implemented are very impressive.” One promising example: researchers at the University of Strasbourg in France created Unified Column Descriptors (UCDs)–standard names for the columns in astronomical tables–that Alex Szalay of Baltimore’s Johns Hopkins University, one of the NVO’s two principle investigators, has semantically mapped to 1,300 Sloan items.
With so many sites providing content, the NVO will also need a way to indicate how reliable its data is, cautions Michael Skrutskie, principle investigator of the Two Micron Sky Survey and a professor of astronomy at the University of Virginia in Charlottesville. “People will need to know how much trust they can put into those data points.” Skrutskie suggests the issue might be solved with a labeling system, and Hanisch says a peer-review process for one is in the works.
Proponents say the NVO could be up in two to three years, especially if there’s money for the operational phase. They plan to demonstrate real-time analysis of clustered galaxies at a January 2003 meeting of the American Astronomical Society in Seattle. The first showing of interoperability among international VOs should be ready for the July 2003 general assembly of the International Astronomical Union in Sydney, Australia. “I think by the end of two years, we’ll have interoperable data centers and a bunch of toolkits,” predicts Jim Annis, an astrophysicist at the Fermi National Accelerator Laboratory in Batavia, IL. Longer term, the project could still falter if the NVO’s middleware standards make it too expensive for institutions to prepare their survey data, the fault that doomed the pre-Web Astrophysics Data System, in Hanisch’s opinion.
Regardless of how it gets assembled, astronomers seem excited about the NVO’s potential as a research tool, sometimes referring to it as an instrument on a par with the telescope. “The important part of it is just being able to do searches and queries and being able to get all that information on one object,” says Dave Turnshek, a professor in the astrophysics and astronomy department at the University of Pittsburgh, one of 17 research centers sharing the NSF grant. Turnshek’s school paid to get access to the Sloan survey, and he uses it heavily for his research in quasar and galaxy formation. “The exciting thing about the NVO is, eventually everybody will be able to do that,” he says.
Adds Hanisch: “My wildest dreams of success are that the VO stuff becomes just part of doing astronomy. It will be just like going to Google.”