A Reality Check for IBM’s AI Ambitions

IBM, number 39 on our list of the 50 Smartest Companies, overhyped its Watson machine-learning system, but the company still could have the best access to the kind of data needed to make medicine much smarter.

David H. Freedmanarchive page

June 27, 2017

Leonard Greco

Paul Tang was with his wife in the hospital just after her knee replacement surgery, a procedure performed on about 700,000 people in the U.S. every year. The surgeon came by, and Tang, who is himself a primary-care physician, asked when he expected her to be back at her normal routines, given his experience with patients like her. The surgeon kept giving vague non-answers. “Finally it hit me,” says Tang. “He didn’t know.” Tang would soon learn that most physicians don’t know how their patients do in the ordinary measures of life back at home and at work—the measures that most matter to patients.

Tang still sees patients as a physician, but he’s also chief health transformation officer for IBM’s Watson Health (see "50 Smartest Companies 2017.") That’s the business group developing health-care applications for Watson, the machine-learning system that IBM is essentially betting its future on. Watson could deliver information that physicians are not getting now, says Tang. It could tell a doctor, for instance, how long it took for patients similar to Tang’s wife to be walking without pain, or climbing stairs. It could even help analyze images and tissue samples and determine the best treatments for any given patient.

It’s because of possibilities like this that health care is one of the hottest segments of the market for machine-learning technologies. Research firm CB Insights counts at least 106 startups that have sprung up since 2013 and are still in business.

None of those companies has garnered anywhere near the attention that Watson has, thanks to its victory on the television quiz show Jeopardy! in 2011 and zealous marketing by IBM ever since. But lately, much of the press for Watson has been bad. A heavily promoted collaboration with the M.D. Anderson Cancer Center in Houston fell apart this year. As IBM’s revenue has swooned and its stock price has seesawed, analysts have been questioning when Watson will actually deliver much value. “Watson is a joke,” Chamath Palihapitiya, an influential tech investor who founded the VC firm Social Capital, said on CNBC in May.

However, most of the criticism of Watson, even from M.D. Anderson, doesn’t seem rooted in any particular flaw in the technology. Instead, it’s a reaction to IBM’s overly optimistic claims of how far along Watson would be by now. In fact, it still seems likely that Watson Health will be a leader in applying AI to health care’s woes. If Watson has not, as of yet, accomplished a great deal along those lines, one big reason is that it needs certain types of data to be “trained.” And in many cases such data is in very short supply or difficult to access. That’s not a problem unique to Watson. It’s a catch-22 facing the entire field of machine learning for health care.

Though the problem of missing and inaccessible data may slow Watson down, it may hurt IBM’s competitors more. That’s because the best bet for getting the data lies in close partnerships with large health-care organizations that tend to be technologically conservative. And one thing IBM still does very well in comparison to startups, or even giant rivals like Apple and Google, is gain the trust of executives and IT managers at big organizations. The specific problems with the M.D. Anderson project notwithstanding, IBM has a crucial advantage. It’s getting Watson inside a wide range of medical centers, health-care administration groups, and life-science companies, all of which are positioned to provide the critical data needed to shape AI’s future in medicine.

Unrealistic timelines

The breakup with M.D. Anderson seemed to show IBM choking on its own hype about Watson.

The cancer center and IBM partnered in 2012. The goal was for Watson to read data about any patient’s symptoms, gene sequence, and pathology reports, combine it with physicians’ notes on the patient and relevant journal articles, and then help doctors come up with diagnoses and treatments. But IBM and M.D. Anderson both overinflated expectations for the technology. IBM claimed in 2013 that “a new era of computing has emerged” and gave Forbes the impression that Watson “now tackles clinical trials” and would be in use with patients in just a matter of months. In 2015, the Washington Post quoted an IBM Watson manager describing how Watson was busy establishing a “collective intelligence model between machine and man.” The Post said that the computer system was “training alongside doctors to do what they can’t.”

“Health care has been an embarrassingly late adopter of technology,” says Manish Kohli, a physician and health-care informatics expert with the Cleveland Clinic.

In February of this year, the University of Texas, which runs M.D. Anderson, announced it had shuttered the project, leaving the medical center out $39 million in payments to IBM—for a project originally contracted at $2.4 million. After four years it had not produced a tool for use with patients that was ready to go beyond pilot tests. M.D. Anderson wouldn’t comment to me about Watson specifically, but it appears that the problems stemmed mainly from internal struggles over how the project was managed and funded.

That’s not to say IBM has no troubles with Watson. Indeed, they’re larger than what any one implementation reveals.

To understand what’s slowing the progress, you have to understand how machine-learning systems like Watson are trained. Watson “learns” by continually rejiggering its internal processing routines in order to produce the highest possible percentage of correct answers on some set of problems, such as which radiological images reveal cancer. The correct answers have to be already known, so that the system can be told when it gets something right and when it gets something wrong. The more training problems the system can chew through, the better its hit rate gets.

That’s relatively simple when it comes to training the system to identify malignancies in x-rays. But for potentially groundbreaking puzzles that go well beyond what humans already do, like detecting the relationships between gene variations and disease, Watson has a chicken-and-egg problem: how does it train on data that no experts have already sifted through and properly organized? “If you’re teaching a self-driving car, anyone can label a tree or a sign so the system can learn to recognize it,” says Thomas Fuchs, a computational pathologist at Memorial Sloan-Kettering, a cancer center in New York. “But in a specialized domain in medicine, you might need experts trained for decades to properly label the information you feed to the computer.”

Some version of that stumbling block emerges in every domain in which IBM hopes to have Watson contribute—as it does for any company’s machine-learning solution. To train Watson to go through giant pools of data and pull out the few pieces of information important to a single patient, someone has to do it by hand first, for thousands and thousands of cases. To recognize genes linked to disease, Watson needs thousands of records of patients who have specific diseases and whose DNA has been analyzed. But those gene-and-patient-record combinations can be hard to come by. In many cases, the data simply doesn’t exist in the right format—or in any form at all. Or the data may be scattered throughout dozens of different systems, and difficult to work with.

Consider, for example, the goal of improving primary care by placing better data at the fingertips of clinicians. When doctors miss chances to treat relatively minor concerns during a routine primary-care visit, before a more advanced problem sends patients to an emergency room or a specialist, their health suffers and costs explode. “About one-third of every dollar spent on health is probably unnecessary,” says Anil Jain, IBM Watson Health’s chief medical officer, who is also a practicing primary-care physician. Machine learning is widely recognized as an opportunity to address that problem.

To really help doctors get better outcomes for patients, however, Watson will need to find correlations between what it reads in health records and what Tang calls “all the social determinants of health.” Those factors include whether patients are drug-free, avoiding the wrong foods, breathing clean air, and on and on. But Tang concedes that today almost no hospitals or medical practices get that data reliably for a significant percentage of patients. Part of the problem is that hospitals have been slow to take up modern, data-driven practices. “Health care has been an embarrassingly late adopter of technology,” says Manish Kohli, a physician and health-care informatics expert with the Cleveland Clinic.

Where the data does exist, IBM has often simply gone out and bought it. It has acquired companies such as Truven Health Analytics, Explorys, and Phytel, all of which were active in dealing with large data sets across hospitals and patient populations. And even after the demise of the M.D. Anderson deal, IBM has some critical partnerships that further its access to patient data.

One of them is with Atrius Health, a network of nearly 900 mostly primary-care physicians throughout the Boston area. The partnership aims to develop and test a Watson-based system capable of pulling out nuggets of information critical to an individual patient from an ocean of notes, records, and articles. “Seeking all the relevant information is an onerous job for primary-care physicians as things exist today,” says Atrius’s chief medical officer, Joe Kimura. Electronic health records may have made the problem even worse, he adds, because the advent of such systems has enormously increased the amount of data generated in each visit, without providing a standard format to allow easy retrieval.

Critically, many of the most important notes in patient records are sentences that a conventional IT system can’t make sense of. But Watson can apply the natural-language-processing skills developed for Jeopardy! in order to extract meaning from them. Ideally, it could then suggest ways physicians can help patients avoid the need for extensive care. “Why should we focus only on making sure we’ve done as good a job as possible with patients who break a hip,” asks Kimura, “when we can try to predict which patients are at risk for falls and help them not break the hip at all? We need to push our care upstream.”

A leukemia doctor at M.D. Anderson, Courtney DiNardo, used IBM’s Watson system while consulting with a patient in 2013.

IBM announced in 2015 that Watson’s diagnostic capabilities would be boosted by data obtained from Merge Healthcare, a medical imaging management company that IBM bought for about $1 billion.

Watson Health is also partnering with the Central New York Care Collaborative, a state-government-funded agency that works with some 2,000 health-care providers in six counties. The partnership is intended to support the goal of a 25 percent reduction in emergency room admissions and hospital readmissions, when patients who have been discharged from a hospital have to return to address related problems. It also provides potential access to vast amounts of patient data.

There are other ways to get such data. One of Google’s sister companies is trying to get it directly from patients themselves. Verily Life Sciences, a health-care division of Alphabet, is partnering with Duke and Stanford to develop a highly structured health database on some 10,000 volunteers. The database will be filled with information not only from their clinical visits but also from wearable health-monitoring devices. That could be a promising leap in data access, though it could take a decade or more to produce highly usable results.

Fuchs’s group at Memorial Sloan-Kettering hopes to train an AI system to read tissue-stain slides, a process that will require a large library of digitally annotated slides with confirmed diagnoses and other critical data. So the group is gearing up to produce 40,000 such slides a month on its own. “That’s far more than anyone else,” Fuchs says. “It’s an enormous task because of all the variability in biology.”

Even M.D. Anderson, despite the fate of the Watson project, is continuing a large program that began around the same time, focused on gathering 1,700 types of clinical data on every patient who walks in its doors. Andy Futreal, the scientist who runs the program, says combining that patient information with research data will be crucial for the sorts of capabilities that systems like Watson could provide. “Once we have the data in place, now you can get into the business of AI machine learning uncovering those factors dictating who does and who doesn’t do well with different treatments,” says Futreal.

IBM, for its part, continues to rack up data from partnerships. For cancer diagnosis and care alone, the company has partnered with Memorial Sloan-Kettering, the Mayo Clinic, the Harvard- and MIT-affiliated Broad Institute, and medical-test giant Quest Diagnostics. The Memorial Sloan-Kettering collaboration has already produced a system that sifts through journal literature to inform treatment decisions, and it has been rolled out to the Jupiter Medical Center in Florida and a hospital chain in India. On the drug-discovery front, Watson Health is working with the Barrow Neurological Institute, where Watson helped find five genes linked to ALS that had never before been associated with the disease, and with the Ontario Brain Institute, where Watson identified 21 promising potential drug candidates.

Will Watson eventually make a difference in improving health outcomes and lowering costs? Probably, says Stephen Kraus, a partner at the VC firm Bessemer Venture Partners who focuses on health care and has invested in AI health-care startups. “It’s all for real,” says Kraus. “This isn’t about putting out vaporware in order to boost stock prices.” But Kraus joins most experts in cautioning against unrealistic timelines or promises—some of which have come from IBM itself. “This is hard,” he says. “It’s not happening today, and it might not be happening in five years. And it’s not going to replace doctors.”

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.