Skip to Content

Mining Data for Better Medicine

The health battles of millions, recorded digitally, open a world of virtual research.
September 19, 2011

The antidepressant Paxil was approved for sale in 1992, the cholesterol-lowering drug Pravachol in 1996. Company studies proved that each drug, on its own, works and is safe. But what about when they are taken together?

Prescription for speed: Digitized medical records allow data mining of hospital cases.

By mining tens of thousands of electronic patient records, researchers at Stanford University quickly discovered an unexpected answer: people who take both drugs have higher blood glucose levels. The effect was even greater in diabetics, for whom excess blood sugar is a health danger.

The research is an example of the increasing ease with which scientists now scour digitized medical results, like glucose tests and drug prescriptions, to find hidden patterns. “You’re not constrained by the need to actually get patients lined up in a clinical trial that would be incredibly expensive,” says Russ Altman, director of Stanford’s Biomedical Informatics Training Program, whose group published the Paxil/Pravachol result in the journal Clinical Pharmacology and Therapeutics this July. “We had most of this paper done probably in a month.”

The spread of electronic patient records, with their computer-readable entries, is opening new possibilities for medical data mining. Instead of being limited to carefully planned studies on volunteers, scientists can increasingly carry out research virtually by sifting through troves of data collected from the unplanned experiments of real life, as preserved in medical records from scores of hospitals.

Such techniques are allowing researchers to ask questions never envisioned at the time of a drug’s approval, such as how a medicine might affect particular ethnicities. They are also being used to uncover evidence of economic problems, such as overbilling and unnecessary procedures. Mining of health records “is going to build advancements in research, but also efficiencies in the health delivery system,” says Margaret Anderson, executive director of FasterCures, a think tank in Washington, D.C.

Some large hospital systems that use electronic records now employ full-time database research teams. Laurence Meyer, associate chief of staff for research at the Salt Lake City Veterans Administration Medical Center, says he knows of more than 100 research projects using electronic records from the VA’s six million patients, who are seen at 152 hospitals and 804 outpatient clinics across the country.

“If you’re looking at a single hospital’s cases of, say, hypertrophic cardiomyopathy, you might have 20 or 30 over 10 years, whereas all of a sudden we’re looking at thousands of cases,” says Meyer.

Large numbers of patient records are critical to these efforts, researchers say. In 2002, in the best-known case of a medical discovery to emerge from a database, researchers with the California managed-care provider Kaiser Permanente helped show that the $2.5 billion pain drug Vioxx was killing people by causing heart attacks. The effect became apparent only after Kaiser combed the records of its eight million patients. Vioxx was subsequently pulled from the market.

Similarly, Altman’s group at Stanford is developing tools to sift through the U.S. Food and Drug Administration’s Adverse Event Reporting System, a database containing several million reports of drugs that have harmed patients. The researchers designed an algorithm that searched for patients taking widely prescribed drugs who suffered side effects similar to those seen in diabetics. A strong signal came from a combination of Paxil and Pravachol, which on their own had never been linked to changes in blood sugar.

To confirm the clue, Altman’s team pored through electronic patient records to identify people who had taken one of the drugs, then both, and whose blood sugar had been measured. When only 12 such cases turned up among 141,000 Stanford records, the researchers approached hospitals at Harvard and Vanderbilt Universities for more records. Altman says his team eventually identified 239 patients—enough for a virtual clinical trial that he says proved the drug combination raises blood sugar and could be a danger to diabetics.

Despite such successes, Altman and other medical researchers say data-mining research is held back by practical obstacles. Most medical information remains trapped in paper records and handwritten notes that can’t easily be read by computers or shared by researchers. According to the Centers for Disease Control and Prevention, in 2009 fewer than one in four doctors were using electronic records. Even when such records exist, differences in the way hospitals describe the same conditions can cause headaches for researchers.

In other cases, valuable data isn’t being released because of privacy or legal concerns. This year, the Wall Street Journal sued for release of a vast trove of government Medicare billing data, arguing that mining the data could reveal clear indications of fraud. In that case, the government’s concern is protecting the privacy of doctors, but patient privacy rights also frequently set limits on research.

Patient advocates believe that making use of digitized data should be a higher priority in medicine. “There’s just an incredibly wide range of possibilities for research from using all this aggregated data,” says FasterCures’ Anderson. “We’re asking, ‘Why aren’t we paying a little bit more attention to that?’”

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.