Skip to Content
Artificial intelligence

Deep Learning Is a Black Box, but Health Care Won’t Mind

These algorithms are able to diagnose disease as accurately as expert physicians.
Jon Han

Earlier in 2017, artificial intelligence scientist Sebastian Thrun and colleagues at Stanford University demonstrated that a “deep learning” algorithm was capable of diagnosing potentially cancerous skin lesions as accurately as a board-certified dermatologist.

The cancer finding, reported in Nature, was part of a stream of reports offering an early glimpse into what could be a new era of “diagnosis by software,” in which artificial intelligence aids doctors—or even competes with them.

Experts say medical images, like photographs, x-rays, and MRIs, are a nearly perfect match for the strengths of deep-learning software, which has in the past few years led to breakthroughs in recognizing faces and objects in pictures.

Companies are already in pursuit. Verily, Alphabet's life sciences arm, joined forces with Nikon last December to develop algorithms to detect causes of blindness in diabetics. The field of radiology, meanwhile, has been dubbed the “Silicon Valley of medicine” because of the number of detailed images it generates.

Black-box medicine

Although the predictions by Thrun’s team were highly accurate, no one was sure exactly which features of a mole the deep-learning program used to classify it as cancerous or benign. The result is the medical version of what’s been termed deep learning’s “black box” problem.

Unlike more-traditional vision software, where a programmer defines rules—for example, a stop sign has eight sides—in deep learning the algorithm finds the rules itself, but often without leaving an audit trail to explain its decisions.

“In the case of black-box medicine, doctors can’t know what is going on because nobody does; it’s inherently opaque,” says Nicholson Price, a legal scholar from the University of Michigan who focuses on health law.

Yet Price says that may not pose a serious obstacle in health care. He likens deep learning to drugs whose benefits come about by unknown means. Lithium is one example. Its exact biochemical mechanism in affecting mood has yet to be elucidated, but the drug is still approved for treatment of bipolar disorder. The mechanism behind aspirin, the most widely used medicine of all time, wasn’t understood for 70 years.

Similarly, Price says, the black-box issue won’t pose a problem with the U.S. Food and Drug Administration, which, in addition to approving new drugs, also regulates software if its purpose is to treat or prevent disease.

In a statement, the FDA says that over the past 20 years it has approved “a number of image analysis applications that rely on a variety of pattern recognition, machine learning, and computer vision techniques.” The agency confirmed that it’s seeing more software powered by deep learning and notes that companies are allowed to keep the details of their algorithms confidential.

The FDA has already given the green light to at least one deep-learning algorithm. In January the FDA cleared for sale software developed by Arterys, a privately held medical-imaging company based in San Francisco. Its algorithm, “DeepVentricle,” analyzes MRI images of the interior contours of the heart’s chambers and calculates the volume of blood a patient’s heart can hold and pump. That calculation is completed in less than 30 seconds, Arterys says, whereas conventional methods typically take an hour.

The FDA required Arterys to do extensive testing to make sure the results from its algorithm were on par with those generated by physicians. “You need to prove statistically that your algorithm is following whatever its intended use is or [what the] marketing claims say it’s doing,” says John Axerio-Cilies, the company’s chief technology officer.

Big demand

To train their software, the team led by Thrun, a former vice president at Google who worked on driverless cars there, fed it 129,405 images of skin conditions evaluated by experts. These covered 2,032 different diseases and included 1,942 images of confirmed skin cancers.

Eventually the software was able to outperform 21 dermatologists in identifying which moles were potentially cancerous.

“When dermatologists see the potential of this technology, I think most will embrace it,” says Robert Novoa, a Stanford dermatologist and an author of the study. He and other team members declined to say if they plan to commercialize the software.

Any worry that doctors will soon be out of a job is also misplaced, says Allan Halpern, a Memorial Sloan Kettering dermatologist and president of the International Society for Digital Imaging of the Skin. “I think the threat is the opposite,” he says. Algorithms “could drive the demand for dermatological services up dramatically.”

That’s because a positive on a screening test still requires a biopsy. Deep-learning software could find a role in primary-care offices, Halpern says, but if it were made available as a population-wide screening test, or through a consumer app, there wouldn’t be enough dermatologists to follow up on the leads.

Axerio-Cilies says companies will be tempted to offer deep-learning tools directly to consumers. For instance, people might scan their own moles to see if they need to visit a doctor. Some non-AI cellphone apps, like Mole Mapper, already allow people to track suspicious moles and record any changes over time.

Halpern, however, says he doesn’t think consumers are ready to deal with diagnostic systems that might tell them a mole has a 5 percent chance, or a 50 percent chance, of being cancer.

“We aren’t great at using probabilities,” he says.


Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Responsible technology use in the AI age

AI presents distinct social and ethical challenges, but its sudden rise presents a singular opportunity for responsible adoption.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.