The news: An international consortium of medical experts has introduced the first official standards for clinical trials that involve artificial intelligence. The move comes at a time when hype around medical AI is at a peak, with inflated and unverified claims about the effectiveness of certain tools threatening to undermine people’s trust in AI overall.
What it means: Announced in Nature Medicine, the British Medical Journal, and the Lancet, the new standards extend two sets of guidelines around how clinical trials are conducted and reported that are already used around the world for drug development, diagnostic tests, and other medical interventions. AI researchers will now have to describe the skills needed to use an AI tool, the setting in which the AI is evaluated, details about how humans interact with the AI, the analysis of error cases, and more.
Why it matters: Randomized controlled trials are the most trustworthy way to demonstrate the effectiveness and safety of a treatment or clinical technique. They underpin both medical practice and health policy. But their trustworthiness depends on whether researchers stick to strict guidelines in how their trials are carried out and reported. In the last few years, many new AI tools have been developed and described in medical journals, but their effectiveness has been hard to compare and assess because the quality of trial designs varies. In March, a study in the BMJ warned that poor research and exaggerated claims about how good AI was at analyzing medical images posed a risk to millions of patients.
Peak hype: A lack of common standards has also allowed private companies to crow about the effectiveness of their AI without facing the scrutiny applied to other types of medical intervention or diagnosis. For example, the UK-based digital health company Babylon Health came under fire in 2018 for announcing that its diagnostic chatbot was “on par with human doctors,” on the basis of a test that critics argued was misleading.
Babylon Health is far from alone. Developers have been claiming that medical AIs outperform or match human ability for some time, and the pandemic has sent this trend into overdrive as companies compete to get their tools noticed. In most cases, evaluation of these AIs is done in-house and in favorable conditions.
Future promise: That’s not to say AI can’t beat human doctors. In fact, the first independent evaluation of an AI diagnostic tool that outperformed humans in spotting cancer on mammograms was published only last month. The study found that a tool made by Lunit AI and used in certain hospitals in South Korea finished in the middle of the pack of radiologists it was tested against. It was even more accurate when paired with a human doctor. By separating the good from the bad, the new standards will make this kind of independent evaluation easier, ultimately leading to better—and more trustworthy—medical AI.
A horrifying new AI app swaps women into porn videos with a click
Deepfake researchers have long feared the day this would arrive.
The therapists using AI to make therapy better
Researchers are learning more about how therapy works by examining the language therapists use with clients. It could lead to more people getting better, and staying better.
DeepMind says its new language model can beat others 25 times its size
RETRO uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network
2021 was the year of monster AI models
GPT-3, OpenAI’s program to mimic human language, kicked off a new trend in artificial intelligence for bigger and bigger models. How large will they get, and at what cost?
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.