Google’s AI breast cancer screening tool is learning to generalize across countries

Karen Haoarchive page

January 3, 2020

mammogramGetty

In a preliminary test, a model trained only on data from UK women still performed better than experts on US patients.

The news: DeepMind and Google Health have developed a new AI system to help doctors detect breast cancer early. The researchers trained an algorithm on mammogram images from female patients in the US and UK, and it performed better than human radiologists. The results were published in Nature on Wednesday.

A tragedy of errors: Breast cancer is the most common cancer for women globally, and their second leading cause of death. Though early detection and treatment can improve a patient’s prognosis, screening tests have high rates of error. About 1 in 5 screenings fail to find breast cancer even when it’s present, also known as a false negative; 50% of women who receive annual mammograms also get at least one false alarm over a 10-year period, known as a false positive.

The results: In tests, the AI system decreased both types of error. For US patients, it reduced false negatives and positives by 9.4% and 5.7%, respectively; for UK patients it reduced them by 2.7% and 1.2%. In a separate experiment, the researchers tested the system’s ability to generalize: they trained the model using only mammograms from UK patients, and then evaluated its performance on US patients. The system still outperformed human radiologists, reducing false negatives and positives by 8.1% and 3.5%.

Why it matters: The system’s ability to generalize in this way has promising implications. It shows that it may be possible to overcome one of the biggest challenges facing AI adoption in health care: the need for ever more data to cover a representative patient population. But such results should also be interpreted with caution. Relatively speaking, the US and UK have quite similar populations. The system likely would not generalize as well to other parts of the world.

Related work: Last October, NYU researchers published a similar study, demonstrating an AI system for breast cancer screenings on par with human radiologists. The primary differences, however, were that it only used mammograms from US patients, and it compared the system’s performance with human expert diagnoses conducted in an artificial lab environment. Google and DeepMind instead compared performance with real-world diagnoses.

Human and machine: Ultimately, both studies conclude that such AI breast cancer screenings should be used in tandem with human radiologists. The combination achieves the most accurate diagnostic results but still reduces the workload on human radiologists, which would help free up their time to focus more on patient care.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.