The news: An international consortium of medical experts has introduced the first official standards for clinical trials that involve artificial intelligence. The move comes at a time when hype around medical AI is at a peak, with inflated and unverified claims about the effectiveness of certain tools threatening to undermine people’s trust in AI overall.
What it means: Announced in Nature Medicine, the British Medical Journal, and the Lancet, the new standards extend two sets of guidelines around how clinical trials are conducted and reported that are already used around the world for drug development, diagnostic tests, and other medical interventions. AI researchers will now have to describe the skills needed to use an AI tool, the setting in which the AI is evaluated, details about how humans interact with the AI, the analysis of error cases, and more.
Why it matters: Randomized controlled trials are the most trustworthy way to demonstrate the effectiveness and safety of a treatment or clinical technique. They underpin both medical practice and health policy. But their trustworthiness depends on whether researchers stick to strict guidelines in how their trials are carried out and reported. In the last few years, many new AI tools have been developed and described in medical journals, but their effectiveness has been hard to compare and assess because the quality of trial designs varies. In March, a study in the BMJ warned that poor research and exaggerated claims about how good AI was at analyzing medical images posed a risk to millions of patients.
Peak hype: A lack of common standards has also allowed private companies to crow about the effectiveness of their AI without facing the scrutiny applied to other types of medical intervention or diagnosis. For example, the UK-based digital health company Babylon Health came under fire in 2018 for announcing that its diagnostic chatbot was “on par with human doctors,” on the basis of a test that critics argued was misleading.
Babylon Health is far from alone. Developers have been claiming that medical AIs outperform or match human ability for some time, and the pandemic has sent this trend into overdrive as companies compete to get their tools noticed. In most cases, evaluation of these AIs is done in-house and in favorable conditions.
Future promise: That’s not to say AI can’t beat human doctors. In fact, the first independent evaluation of an AI diagnostic tool that outperformed humans in spotting cancer on mammograms was published only last month. The study found that a tool made by Lunit AI and used in certain hospitals in South Korea finished in the middle of the pack of radiologists it was tested against. It was even more accurate when paired with a human doctor. By separating the good from the bad, the new standards will make this kind of independent evaluation easier, ultimately leading to better—and more trustworthy—medical AI.
AI for everything: 10 Breakthrough Technologies 2024
Generative AI tools like ChatGPT reached mass adoption in record time, and reset the course of an entire industry.
OpenAI teases an amazing new generative video model called Sora
The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.
Google’s Gemini is now in everything. Here’s how you can try it out.
Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.
Deploying high-performance, energy-efficient AI
Investments into downsized infrastructure can help enterprises reap the benefits of AI while mitigating energy consumption, says corporate VP and GM of data center platform engineering and architecture at Intel, Zane Ball.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.