Skip to Content

Diagnosing with Data

The Mayo Clinic is transforming medicine with advanced computing.
December 1, 2003

Even a top-notch specialist like Piet de Groen, a gastroenterologist at the Mayo Clinic in Rochester, MN, can’t know everything about every illness his patients may suffer. But on the rare occasions that he encounters an ailment he’s never seen before, chances are another physician at the hospital has. So de Groen is developing an electronic “data warehouse” that allows him to type in a patient’s symptoms and-within seconds-get a list of all similar Mayo patient records. By 2004, after initial data security and patient confidentiality issues have been resolved, de Groen and his colleagues will be able to use these histories to make more accurate diagnoses. In the long term, they could even access your genetic profile to help choose a course of treatment.

The Mayo system is being built with the collaboration of IBM Life Sciences in Rochester and in Yorktown Heights, NY. Started in the winter of 2002, the project has already produced a large database of medical records and software that can find groups of patients with similar conditions and treatments. While hospitals and HMOs are increasingly using electronic records to track patient histories, the Mayo system goes further. It automatically groups patients according to the factors they have in common, allowing doctors to search quickly for combinations of factors. It will be used first for medical research but ultimately to improve patient care. “The application of information technology and bioinformatics is moving toward medicine and patient care much more rapidly than anyone anticipated,” says Carol Kovac, general manager of IBM Life Sciences.

The Mayo’s data warehouse contains 4.4 million patient histories recorded over the past five years. Doctors can search these records by symptoms, age, patient’s home state, date of diagnosis, and other factors; in 2004, drug information will become available as well. Because the records are already clustered according to common characteristics, searches can zero in on the most likely matches, instead of poring through the entire database patient by patient. When a new patient’s information is entered, software automatically compares it with existing patterns and groups it accordingly. The result: the system could eventually operate fast enough to be used during a visit to the doctor’s office. So the doctor might, for example, check how older female patients with a specific set of symptoms respond to a particular drug.

But doctors and patients want more. They want to know why drug therapies work for some but not others; which people are more susceptible to cancer; and ultimately, what the best treatment is for each individual. For these kinds of queries, clinicians need access to genetic information, such as that gleaned from microarrays that provide snapshots of the activity levels of thousands of genes. “Using genomics to affect the way medicine is done, and making a secure repository of information that doctors can access-these avenues will converge,” says Gustavo Stolovitsky, a computational biologist at IBM.

As a first step, researchers at IBM have developed smarter software to find patterns in large groups of genes that signify whether a group of patients with common symptoms has a certain variation of a disease-which should lead to better diagnoses for individuals. For example, the scientists developed a genetic screen for leukemia using algorithms that can efficiently examine large numbers of gene combinations. They used it to identify a unique signature of about 100 genes in Mayo patients with a common form of leukemia (see top image). In a separate test at Columbia University, doctors used the screen to diagnose the disease, with 100 percent accuracy. The hope is that the analysis software can be extended to other cancers and to cardiovascular disease.

Down the road, databases that incorporate such genetic data, combined with pattern recognition algorithms, could allow doctors to detect disease before symptoms emerge and create treatment plans customized to patients. “I am convinced,” says de Groen, “that in five years, for some tumors, we’ll be able to say, We know from your DNA and study of your tumor that this drug will work, and with little toxicity.’” But translating prototypes into practical systems that can handle the extra genomic data and still be searchable-and affordable-will take a while.

Critical to implementation of the systems is overcoming concerns about security and patient privacy. In addition to enforcing “military-level security” on patient data, says de Groen, clinicians “have to make people understand that there are tons of benefits in having their information available.” And, he adds, policymakers must ensure that such medical information cannot be hijacked or used to deprive patients of insurance or other services. Currently, use of the Mayo database is restricted to project coordinators. But starting in January 2004, password-protected access will gradually be granted to Mayo doctors who file protocols that demonstrate a need to use the data in their research. Longer-term plans call for the database to be accessible to any doctor with a patient waiting in his or her office.

Ultimately, after the technical and social issues have been resolved, collaborations between infotech companies and clinicians will mean widespread improvements in patient care. “It might take a decade, but information will transform medicine for the ordinary Joe on the street,” says IBM’s Kovac. If she’s right, a quick database search might become a standard part of any medical checkup.

Projects Using Advanced Database Tools in Medicine Institution Application/Strategy Duke University School of Medicine (Durham, NC) Combining genetic markers, medical images, and clinical histories in an electronic health-care database Hadassah Hospital (Jerusalem, Israel) Integrating gene expression profiles and medical images into electronic patient records iCapture Research Centre, University of British Columbia
(Vancouver, British Columbia) Correlating genetic markers and environmental factors with heart and lung diseases, using pattern discovery algorithms Kobe General Hospital (Kobe, Japan) Integrating patient records and genetic data for personalized care Mayo Clinic (Rochester, MN) Unifying patient records in a single cross-searchable database; later versions will include genomic information University of California, San Diego, School of Medicine (San Diego, CA) Testing a secure database that allows doctors and patients to access clinical records over the Internet; developing faster computing tools to analyze gene expression for diagnosis and treatment of cancer

Keep Reading

Most Popular

10 Breakthrough Technologies 2024

Every year, we look for promising technologies poised to have a real impact on the world. Here are the advances that we think matter most right now.

Scientists are finding signals of long covid in blood. They could lead to new treatments.

Faults in a certain part of the immune system might be at the root of some long covid cases, new research suggests.

AI for everything: 10 Breakthrough Technologies 2024

Generative AI tools like ChatGPT reached mass adoption in record time, and reset the course of an entire industry.

What’s next for AI in 2024

Our writers look at the four hot trends to watch out for this year

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.