The news: An international consortium of medical experts has introduced the first official standards for clinical trials that involve artificial intelligence. The move comes at a time when hype around medical AI is at a peak, with inflated and unverified claims about the effectiveness of certain tools threatening to undermine people’s trust in AI overall.
What it means: Announced in Nature Medicine, the British Medical Journal, and the Lancet, the new standards extend two sets of guidelines around how clinical trials are conducted and reported that are already used around the world for drug development, diagnostic tests, and other medical interventions. AI researchers will now have to describe the skills needed to use an AI tool, the setting in which the AI is evaluated, details about how humans interact with the AI, the analysis of error cases, and more.
Why it matters: Randomized controlled trials are the most trustworthy way to demonstrate the effectiveness and safety of a treatment or clinical technique. They underpin both medical practice and health policy. But their trustworthiness depends on whether researchers stick to strict guidelines in how their trials are carried out and reported. In the last few years, many new AI tools have been developed and described in medical journals, but their effectiveness has been hard to compare and assess because the quality of trial designs varies. In March, a study in the BMJ warned that poor research and exaggerated claims about how good AI was at analyzing medical images posed a risk to millions of patients.
Peak hype: A lack of common standards has also allowed private companies to crow about the effectiveness of their AI without facing the scrutiny applied to other types of medical intervention or diagnosis. For example, the UK-based digital health company Babylon Health came under fire in 2018 for announcing that its diagnostic chatbot was “on par with human doctors,” on the basis of a test that critics argued was misleading.
Babylon Health is far from alone. Developers have been claiming that medical AIs outperform or match human ability for some time, and the pandemic has sent this trend into overdrive as companies compete to get their tools noticed. In most cases, evaluation of these AIs is done in-house and in favorable conditions.
Future promise: That’s not to say AI can’t beat human doctors. In fact, the first independent evaluation of an AI diagnostic tool that outperformed humans in spotting cancer on mammograms was published only last month. The study found that a tool made by Lunit AI and used in certain hospitals in South Korea finished in the middle of the pack of radiologists it was tested against. It was even more accurate when paired with a human doctor. By separating the good from the bad, the new standards will make this kind of independent evaluation easier, ultimately leading to better—and more trustworthy—medical AI.
A Roomba recorded a woman on the toilet. How did screenshots end up on Facebook?
Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.
The viral AI avatar app Lensa undressed me—without my consent
My avatars were cartoonishly pornified, while my male colleagues got to be astronauts, explorers, and inventors.
Roomba testers feel misled after intimate images ended up on Facebook
An MIT Technology Review investigation recently revealed how images of a minor and a tester on the toilet ended up on social media. iRobot said it had consent to collect this kind of data from inside homes—but participants say otherwise.
How to spot AI-generated text
The internet is increasingly awash with text written by AI software. We need new tools to detect it.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.