A leading AI conference is trying to fix the field’s reproducibility crisis

Karen Haoarchive page

April 9, 2019

Last week, organizers of the Neural Information Processing Systems Conference (NeurIPS), one of the world’s largest annual AI research conferences, updated their policy for paper submissions to require what they’re calling a reproducibility checklist. It’s a small shift in a grander fight to curb the growing “reproducibility crisis” in science, where a disconcerting number of research findings are not successfully being replicated by other researchers, casting doubt on the validity of the initial findings.

In February, a statistician from Rice University warned that machine-learning techniques are likely fueling that crisis because the results they produce are difficult to audit. It’s a worrying problem as machine learning is increasingly being applied in important areas such as health care and drug research.

NeurIPS’s reproducibility checklist tries to tackle the problem. Among other things, researchers have to provide a clear description of their algorithm; a complete description of their data collection process; a link to any simulation environment they used during training; and a comprehensive walk-through of what data they kept, tossed, and why. The idea is to create a new standard of transparency for researchers to show how they arrived at their conclusions.

As the “world's most significant AI conference,” wrote Jack Clark, the policy director of the nonprofit OpenAI, in his weekly newsletter Import AI, “NeurIPS 2019 policy will have [a] knock-on effect across [the] wider AI ecosystem.”

This story originally appeared in our Webby-nominated AI newsletter The Algorithm. To have it directly delivered to your inbox, sign up here for free.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.