Cancer’s "World Wide Web"

A lung image database is breathing life into “medical grid” vision.

Tom Mashbergarchive page

March 1, 2006

For several years, clinicians and computer scientists in the U.S. and abroad have been trying to improve cancer care–from diagnosis to treatment–by building vast, interconnected databases full of patient information. They call these repositories “medical grids” and envision the day when a physician in Strasbourg or New Delhi can see, for example, that an indecipherable image of a patient’s lung is very similar to that of a San Francisco patient, whose case history could inform the decision to perform a biopsy.

These nascent databases include not only patients’ medical histories, including such data as MRIs and CT scans, but also information about how they have responded to drugs. But the benefits of these under–construction grids have been slow to come, partly because of technical problems and partly because federal privacy rules make data sharing difficult. Now, a National Cancer Institute project could test a multihospital system for comparing lung cancer images as early as this year–a clear move toward putting grids to use.

Kenneth H. Buetow, director of the institute’s Center for Bioinformatics in Bethesda, MD, calls it a crucial first step toward “a World Wide Web of cancer research.”

In the past year or so, Buetow and his team have collected more than 50,000 images of lung cancers obtained from medical trials and archived them in a secure electronic repository at NCI. Their effort is part of a three-year, $60 million pilot project launched in 2004, which involves 50 cancer centers and more than 600 researchers. The archive is now available on the Internet at http://ncia.nci.nih.gov. In addition to other imaging projects, it contains a large collection of lung cancer cases followed throughout their therapy.

With the database now largely in place, testing is imminent. The image collection is intended to encourage and facilitate research into new software that can automatically compare images of lungs with those already in the database. In such software, algorithms will search for commonalities and build a directory of the likeliest matches. Clinicians in offices and hospitals will be able to contrast the resulting lung images with the scans they need to evaluate.

Comparing images is just the first step. If all goes well, within three years the National Cancer Institute hopes to conduct one or more clinical trials where a vast amount of medical data about lung cancer–including images, types of tumors, drug courses, patient outcomes, even the molecular profiles of the disease–would be used by physicians studying specific cases. The outcomes of these cases would be compared to those of cases treated through conventional approaches to cancer diagnosis. That comparison should yield information not just about the medical response of the patients but also about the accuracy with which the doctors made their diagnoses, and even the degree to which they adhered to standards of medical privacy.

Medical-grid researchers are not short on vision. Comparing images is just the first step. In cases where the scans match, doctors hope to be able to bore deeper into the histories of similar cases and learn which drugs or surgeries worked best. And Buetow says his trials could actually hasten the day when some cancer diagnoses are automated. A doctor could input images (and as the grid expands, blood test results, descriptions of genetic markers, and other patient data) and learn how frequently near-identical test results from patients around the world correlate with specific malignancies such as lymphomas, melanomas, or sarcomas.

And in the future, as gene-sequencing costs come down, the NCI’s grid could even include patients’ genomic information. “The power of the grid is in its capability to aggregate and correlate more and more public-health data from around the world,” said Mary Kratz of the University of Michigan Medical School, a technical advisor to the grid research community. “The more data you have, the more knowledge you generate.”

Meanwhile, mundane technical problems need solving.

Since the data that accompany images vary in type and format from hospital to hospital, researchers are developing standard formats that can harmonize them all. “We’re asking researchers at many competitive institutions to tear down barriers to sharing vast amounts of data,” says Howard Bilofsky, senior fellow at the Center for Bioinformatics at the University of Pennsylvania, which participates in NCI’s project. “Being able to share information in grids across the world in the arena of life science research is not something that is easily done.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.