Skip to Content

Cancer’s "World Wide Web"

A lung image database is breathing life into “medical grid” vision.
March 1, 2006

For several years, clinicians and computer scientists in the U.S. and abroad have been trying to improve cancer care–from diagnosis to treatment–by building vast, interconnected databases full of patient information. They call these repositories “medical grids” and envision the day when a physician in Strasbourg or New Delhi can see, for example, that an indecipherable image of a patient’s lung is very similar to that of a San Francisco patient, whose case history could inform the decision to perform a biopsy.

These nascent databases include not only patients’ medical histories, including such data as MRIs and CT scans, but also information about how they have responded to drugs. But the benefits of these under–construction grids have been slow to come, partly because of technical problems and partly because federal privacy rules make data sharing difficult. Now, a National Cancer Institute project could test a multihospital system for comparing lung cancer images as early as this year–a clear move toward putting grids to use.

Kenneth H. Buetow, director of the institute’s Center for Bioinformatics in Bethesda, MD, calls it a crucial first step toward “a World Wide Web of cancer research.”

In the past year or so, Buetow and his team have collected more than 50,000 images of lung cancers obtained from medical trials and archived them in a secure electronic repository at NCI. Their effort is part of a three-year, $60 million pilot project launched in 2004, which involves 50 cancer centers and more than 600 researchers. The archive is now available on the Internet at In addition to other imaging projects, it contains a large collection of lung cancer cases followed throughout their therapy.

With the database now largely in place, testing is imminent. The image collection is intended to encourage and facilitate research into new software that can automatically compare images of lungs with those already in the database. In such software, algorithms will search for commonalities and build a directory of the likeliest matches. Clinicians in offices and hospitals will be able to contrast the resulting lung images with the scans they need to evaluate.

Comparing images is just the first step. If all goes well, within three years the National Cancer Institute hopes to conduct one or more clinical trials where a vast amount of medical data about lung cancer–including images, types of tumors, drug courses, patient outcomes, even the molecular profiles of the disease–would be used by physicians studying specific cases. The outcomes of these cases would be compared to those of cases treated through conventional approaches to cancer diagnosis. That comparison should yield information not just about the medical response of the patients but also about the accuracy with which the doctors made their diagnoses, and even the degree to which they adhered to standards of medical privacy.

Medical-grid researchers are not short on vision. Comparing images is just the first step. In cases where the scans match, doctors hope to be able to bore deeper into the histories of similar cases and learn which drugs or surgeries worked best. And Buetow says his trials could actually hasten the day when some cancer diagnoses are automated. A doctor could input images (and as the grid expands, blood test results, descriptions of genetic markers, and other patient data) and learn how frequently near-identical test results from patients around the world correlate with specific malignancies such as lymphomas, melanomas, or sarcomas.

And in the future, as gene-sequencing costs come down, the NCI’s grid could even include patients’ genomic information. “The power of the grid is in its capability to aggregate and correlate more and more public-health data from around the world,” said Mary Kratz of the University of Michigan Medical School, a technical advisor to the grid research community. “The more data you have, the more knowledge you generate.”

Meanwhile, mundane technical problems need solving.

Since the data that accompany images vary in type and format from hospital to hospital, researchers are developing standard formats that can harmonize them all. “We’re asking researchers at many competitive institutions to tear down barriers to sharing vast amounts of data,” says Howard Bilofsky, senior fellow at the Center for Bioinformatics at the University of Pennsylvania, which participates in NCI’s project. “Being able to share information in grids across the world in the arena of life science research is not something that is easily done.”

Keep Reading

Most Popular

It’s time to retire the term “user”

The proliferation of AI means we need a new word.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Sam Altman says helpful agents are poised to become AI’s killer function

Open AI’s CEO says we won’t need new hardware or lots more training data to get there.

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

Synthesia's new technology is impressive but raises big questions about a world where we increasingly can’t tell what’s real.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.