Deep Learning Is a Black Box, but Health Care Won’t Mind
Earlier in 2017, artificial intelligence scientist Sebastian Thrun and colleagues at Stanford University demonstrated that a “deep learning” algorithm was capable of diagnosing potentially cancerous skin lesions as accurately as a board-certified dermatologist.
The cancer finding, reported in Nature, was part of a stream of reports offering an early glimpse into what could be a new era of “diagnosis by software,” in which artificial intelligence aids doctors—or even competes with them.
Experts say medical images, like photographs, x-rays, and MRIs, are a nearly perfect match for the strengths of deep-learning software, which has in the past few years led to breakthroughs in recognizing faces and objects in pictures.
Companies are already in pursuit. Verily, Alphabet's life sciences arm, joined forces with Nikon last December to develop algorithms to detect causes of blindness in diabetics. The field of radiology, meanwhile, has been dubbed the “Silicon Valley of medicine” because of the number of detailed images it generates.
Although the predictions by Thrun’s team were highly accurate, no one was sure exactly which features of a mole the deep-learning program used to classify it as cancerous or benign. The result is the medical version of what’s been termed deep learning’s “black box” problem.
Unlike more-traditional vision software, where a programmer defines rules—for example, a stop sign has eight sides—in deep learning the algorithm finds the rules itself, but often without leaving an audit trail to explain its decisions.
“In the case of black-box medicine, doctors can’t know what is going on because nobody does; it’s inherently opaque,” says Nicholson Price, a legal scholar from the University of Michigan who focuses on health law.
Yet Price says that may not pose a serious obstacle in health care. He likens deep learning to drugs whose benefits come about by unknown means. Lithium is one example. Its exact biochemical mechanism in affecting mood has yet to be elucidated, but the drug is still approved for treatment of bipolar disorder. The mechanism behind aspirin, the most widely used medicine of all time, wasn’t understood for 70 years.
Similarly, Price says, the black-box issue won’t pose a problem with the U.S. Food and Drug Administration, which, in addition to approving new drugs, also regulates software if its purpose is to treat or prevent disease.
In a statement, the FDA says that over the past 20 years it has approved “a number of image analysis applications that rely on a variety of pattern recognition, machine learning, and computer vision techniques.” The agency confirmed that it’s seeing more software powered by deep learning and notes that companies are allowed to keep the details of their algorithms confidential.
The FDA has already given the green light to at least one deep-learning algorithm. In January the FDA cleared for sale software developed by Arterys, a privately held medical-imaging company based in San Francisco. Its algorithm, “DeepVentricle,” analyzes MRI images of the interior contours of the heart’s chambers and calculates the volume of blood a patient’s heart can hold and pump. That calculation is completed in less than 30 seconds, Arterys says, whereas conventional methods typically take an hour.
The FDA required Arterys to do extensive testing to make sure the results from its algorithm were on par with those generated by physicians. “You need to prove statistically that your algorithm is following whatever its intended use is or [what the] marketing claims say it’s doing,” says John Axerio-Cilies, the company’s chief technology officer.
To train their software, the team led by Thrun, a former vice president at Google who worked on driverless cars there, fed it 129,405 images of skin conditions evaluated by experts. These covered 2,032 different diseases and included 1,942 images of confirmed skin cancers.
Eventually the software was able to outperform 21 dermatologists in identifying which moles were potentially cancerous.
“When dermatologists see the potential of this technology, I think most will embrace it,” says Robert Novoa, a Stanford dermatologist and an author of the study. He and other team members declined to say if they plan to commercialize the software.
Any worry that doctors will soon be out of a job is also misplaced, says Allan Halpern, a Memorial Sloan Kettering dermatologist and president of the International Society for Digital Imaging of the Skin. “I think the threat is the opposite,” he says. Algorithms “could drive the demand for dermatological services up dramatically.”
That’s because a positive on a screening test still requires a biopsy. Deep-learning software could find a role in primary-care offices, Halpern says, but if it were made available as a population-wide screening test, or through a consumer app, there wouldn’t be enough dermatologists to follow up on the leads.
Axerio-Cilies says companies will be tempted to offer deep-learning tools directly to consumers. For instance, people might scan their own moles to see if they need to visit a doctor. Some non-AI cellphone apps, like Mole Mapper, already allow people to track suspicious moles and record any changes over time.
Halpern, however, says he doesn’t think consumers are ready to deal with diagnostic systems that might tell them a mole has a 5 percent chance, or a 50 percent chance, of being cancer.
“We aren’t great at using probabilities,” he says.
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
AI is dreaming up drugs that no one has ever seen. Now we’ve got to see if they work.
AI automation throughout the drug development pipeline is opening up the possibility of faster, cheaper pharmaceuticals.
GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why
We got a first look at the much-anticipated big new language model from OpenAI. But this time how it works is even more deeply under wraps.
The original startup behind Stable Diffusion has launched a generative AI for video
Runway’s new model, called Gen-1, can change the visual style of existing videos and movies.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.