Skip to Content
Artificial intelligence

Deep Learning Is a Black Box, but Health Care Won’t Mind

These algorithms are able to diagnose disease as accurately as expert physicians.
Jon Han

Earlier in 2017, artificial intelligence scientist Sebastian Thrun and colleagues at Stanford University demonstrated that a “deep learning” algorithm was capable of diagnosing potentially cancerous skin lesions as accurately as a board-certified dermatologist.

The cancer finding, reported in Nature, was part of a stream of reports offering an early glimpse into what could be a new era of “diagnosis by software,” in which artificial intelligence aids doctors—or even competes with them.

Experts say medical images, like photographs, x-rays, and MRIs, are a nearly perfect match for the strengths of deep-learning software, which has in the past few years led to breakthroughs in recognizing faces and objects in pictures.

Companies are already in pursuit. Verily, Alphabet's life sciences arm, joined forces with Nikon last December to develop algorithms to detect causes of blindness in diabetics. The field of radiology, meanwhile, has been dubbed the “Silicon Valley of medicine” because of the number of detailed images it generates.

Black-box medicine

Although the predictions by Thrun’s team were highly accurate, no one was sure exactly which features of a mole the deep-learning program used to classify it as cancerous or benign. The result is the medical version of what’s been termed deep learning’s “black box” problem.

Unlike more-traditional vision software, where a programmer defines rules—for example, a stop sign has eight sides—in deep learning the algorithm finds the rules itself, but often without leaving an audit trail to explain its decisions.

“In the case of black-box medicine, doctors can’t know what is going on because nobody does; it’s inherently opaque,” says Nicholson Price, a legal scholar from the University of Michigan who focuses on health law.

Yet Price says that may not pose a serious obstacle in health care. He likens deep learning to drugs whose benefits come about by unknown means. Lithium is one example. Its exact biochemical mechanism in affecting mood has yet to be elucidated, but the drug is still approved for treatment of bipolar disorder. The mechanism behind aspirin, the most widely used medicine of all time, wasn’t understood for 70 years.

Similarly, Price says, the black-box issue won’t pose a problem with the U.S. Food and Drug Administration, which, in addition to approving new drugs, also regulates software if its purpose is to treat or prevent disease.

In a statement, the FDA says that over the past 20 years it has approved “a number of image analysis applications that rely on a variety of pattern recognition, machine learning, and computer vision techniques.” The agency confirmed that it’s seeing more software powered by deep learning and notes that companies are allowed to keep the details of their algorithms confidential.

The FDA has already given the green light to at least one deep-learning algorithm. In January the FDA cleared for sale software developed by Arterys, a privately held medical-imaging company based in San Francisco. Its algorithm, “DeepVentricle,” analyzes MRI images of the interior contours of the heart’s chambers and calculates the volume of blood a patient’s heart can hold and pump. That calculation is completed in less than 30 seconds, Arterys says, whereas conventional methods typically take an hour.

The FDA required Arterys to do extensive testing to make sure the results from its algorithm were on par with those generated by physicians. “You need to prove statistically that your algorithm is following whatever its intended use is or [what the] marketing claims say it’s doing,” says John Axerio-Cilies, the company’s chief technology officer.

Big demand

To train their software, the team led by Thrun, a former vice president at Google who worked on driverless cars there, fed it 129,405 images of skin conditions evaluated by experts. These covered 2,032 different diseases and included 1,942 images of confirmed skin cancers.

Eventually the software was able to outperform 21 dermatologists in identifying which moles were potentially cancerous.

“When dermatologists see the potential of this technology, I think most will embrace it,” says Robert Novoa, a Stanford dermatologist and an author of the study. He and other team members declined to say if they plan to commercialize the software.

Any worry that doctors will soon be out of a job is also misplaced, says Allan Halpern, a Memorial Sloan Kettering dermatologist and president of the International Society for Digital Imaging of the Skin. “I think the threat is the opposite,” he says. Algorithms “could drive the demand for dermatological services up dramatically.”

That’s because a positive on a screening test still requires a biopsy. Deep-learning software could find a role in primary-care offices, Halpern says, but if it were made available as a population-wide screening test, or through a consumer app, there wouldn’t be enough dermatologists to follow up on the leads.

Axerio-Cilies says companies will be tempted to offer deep-learning tools directly to consumers. For instance, people might scan their own moles to see if they need to visit a doctor. Some non-AI cellphone apps, like Mole Mapper, already allow people to track suspicious moles and record any changes over time.

Halpern, however, says he doesn’t think consumers are ready to deal with diagnostic systems that might tell them a mole has a 5 percent chance, or a 50 percent chance, of being cancer.

“We aren’t great at using probabilities,” he says.


Deep Dive

Artificial intelligence

conceptual illustration showing various women's faces being scanned
conceptual illustration showing various women's faces being scanned

A horrifying new AI app swaps women into porn videos with a click

Deepfake researchers have long feared the day this would arrive.

Conceptual illustration of a therapy session
Conceptual illustration of a therapy session

The therapists using AI to make therapy better

Researchers are learning more about how therapy works by examining the language therapists use with clients. It could lead to more people getting better, and staying better.

a Chichuahua standing on a Great Dane
a Chichuahua standing on a Great Dane

DeepMind says its new language model can beat others 25 times its size

RETRO uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network

ai learning to multitask concept
ai learning to multitask concept

Meta’s new learning algorithm can teach AI to multi-task

The single technique for teaching neural networks multiple skills is a step towards general-purpose AI.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.