As we launch AI systems from the lab into the real world, we need to be prepared for these systems to break in surprising and catastrophic ways. Sharon Li, an assistant professor at the University of Wisconsin, Madison, is a pioneer in an AI safety feature called out-of-distribution (OOD) detection. This feature, she says, helps AI models determine when they should abstain from action if faced with something they weren’t trained on.
Li developed one of the first algorithms on out-of-distribution detection for deep neural networks. Google has since set up a dedicated team to integrate OOD detection into its products. Last year, Li’s theoretical analysis of OOD detection was chosen from over 10,000 submissions as an outstanding paper by NeurIPS, one of the most prestigious AI conferences.
We’re currently in an AI gold rush, and tech companies are racing to release their AI models. But most of today’s models are trained to identify specific things and often fail when they encounter the unfamiliar scenarios typical of the messy, unpredictable real world. Their inability to reliably understand what they “know” and what they don’t “know” is the weakness behind many AI disasters.
Li’s approach embraces uncertainty by using machine learning to detect unknown data out in the world and design AI models to adjust to it on the fly. Out-of-distribution detection could help prevent accidents when autonomous cars run into unfamiliar objects on the road, or make medical AI systems more useful in finding a new disease. “In all those situations, what we really need [is a] safety-aware machine learning model that’s able to identify what it doesn’t know,” says Li.
Connor Coley, 29, developed open-source software that uses artificial intelligence to help discover and synthesize new molecules. The suite of tools, called ASKCOS, is used in production by over a dozen pharmaceutical companies, and tens of thousands of chemists, to create new medicines, new materials, and more efficient industrial processes.
One of the largest bottlenecks in developing new molecules has long been identifying interesting candidates to test. This process has played out in more or less the same way for decades: make a small change to a known molecule, and then test the novel creation for its biological, chemical, or physical properties.
Coley’s approach includes a form of generative AI for chemistry. A chemist flags which properties are of interest, and AI-driven algorithms suggest new molecules with the greatest potential to have those properties. The system does this by analyzing known molecules and their current properties, and then predicting how small structural changes are likely to result in new behaviors.
As a result, chemists should spend less time testing candidates that never pan out. “The types of methods that we work on have led to factors of maybe two, three, maybe 10 [times] reduction in the number of different shots on goal you need to find something that works well,” says Coley, who is now an assistant professor of chemical engineering and computer science at MIT.
Once it identifies the candidate molecules, Coley’s software comes up with the best way to produce them. Even if chemists “imagine or dream up a molecule,” he says, figuring out how to synthesize something isn’t trivial: “We still have to make it.”
To that end, the system gives chemists a “recipe” of steps to follow that are likely to result in the highest yields. Coley’s future work includes figuring out how to add laboratory robots to the mix, so that even more automated systems will be able test and refine the proposed recipes by actually following them.
Catherine De Wolf
Catherine De Wolf, 34, is using AI to help reduce emissions and waste of materials in the construction industry. Her goal is to aid the transition away from a one-time-use building philosophy, where materials used in construction are discarded when a building is torn down, to a circular one, where old building materials are reused since they are cheaper than sourcing new ones.
Old buildings set for demolition often contain a large cache of pricey, semi-finished elements like windows, metal, and wood. But since nobody really knows exactly what any given building contains, it’s usually easier, and cheaper, to demolish them and send the waste to a landfill. Then new materials have to be produced for new construction, a process that generates extra emissions.
“What I thought was: What if we had some tools to scan buildings easily, having the dimensions digitized easily, the type of materials, the condition of the material,” says De Wolf, “and we put that into some kind of Tinder for reusable building materials?”
First, she and her team fed data from Google Street View, lidar scans, and building documents into an AI system they built that can predict what materials each building is likely to contain, and how to design future projects with those materials. Then she and her team tagged recovered materials with QR codes—a process she hopes will become standard when buildings are constructed in the first place. The QR codes link to a database that provides a history of the material and its important physical characteristics.
In one project, a team led by De Wolf, who is an assistant professor in architecture at ETH Zurich, helped match the iconic glass panels from the Centre Pompidou in Paris—which were being replaced in response to regulatory changes—with a firm that used them to build small office rooms. In a second project, she used her methods to create a geodesic dome built entirely from materials salvaged from an old car warehouse in Geneva. She envisions developing an app that will match reused materials with future projects.
Alhussein Fawzi, 34, is pioneering the use of game-playing AI to speed up fundamental computations. Small improvements to popular algorithms can make a huge difference, cutting costs and saving energy across every device that runs them.
But identifying shortcuts in code that has been studied by human scientists for decades is hard. Fawzi’s key insight was to treat the problem of finding new algorithms as a kind of game—and use DeepMind’s game-playing AI, AlphaZero, to master it.
To make moves in a game like chess, AlphaZero searches through an astronomical number of possibilities before picking a move that is most likely to lead to a win. Lining up the sequential steps in a correct algorithm is a little like choosing moves in a winning game. Like chess, it involves scouring through countless possibilities to reach a goal.
Using an adapted version of AlphaZero, Fawzi and his colleagues found a way to speed up matrix multiplication, a fundamental element of math at the heart of many common computer programs in areas from graphics to physics to machine learning itself. They discovered algorithms that were faster than the previous best human-devised ones, beating a record that had stood for 50 years.
Google DeepMind has also used Fawzi’s approach to discover previously unknown shortcuts in sorting algorithms, another fundamental computation that runs trillions of times a day.
“It’s astounding when you think that many of the basic algorithms that we use today were really invented before the era of modern computers, most of them on paper,” says Fawzi. “There’s mileage in using machine learning to try to improve on them.”
In the race to build bigger and better AI models, tech companies are hiding a dirty secret: AI’s carbon footprint. AI systems require massive amounts of energy and water to build and run, and once deployed, they can emit several metric tons of carbon a day.
Sasha Luccioni, 33, a researcher at the AI startup Hugging Face, has developed a better way for tech companies to estimate and measure the carbon footprint of AI language models. Luccioni’s method helps companies calculate the carbon dioxide emissions of their AI systems in a way that accounts for climate impacts during their entire life cycle, including the energy, materials, and computing power needed to train them. For example, her team found that training, building, and running Hugging Face’s AI language model BLOOM has generated around 50 metric tons of carbon dioxide emissions.
Her work “represents the most thorough, honest, and knowledgeable analysis of the carbon footprint of a large ML model to date,” Emma Strubell, an assistant professor in the school of computer science at Carnegie Mellon University, who wrote a seminal 2019 paper on AI’s impact on the climate, told MIT Technology Review in November.
Luccioni says her approach helps people make more informed choices about AI. She says nobody else has done such an in-depth audit of a language model’s emissions. Code Carbon, the tool she helped create, has now been downloaded over 300,000 times.
“Understanding the environmental impacts of these models is really, really important to trying to get ahead of things and making them more efficient,” she says.
Pranav Rajpurkar, 28, has developed a way for AI to teach itself to accurately interpret medical images without any help from humans.
His systems can already perform at the level of human experts, flagging pathologies that might otherwise be missed and preventing unnecessary medical procedures due to false positives. Rajpurkar’s newest model, called CheXzero, could improve their performance further and expand the types of images they can handle.
When Rajpurkar introduced an early model allowing computers to read chest x-rays in 2018, there was a problem: a shortage of data. At the time, he and others in the field relied on radiologists to manually label images that AI systems used for learning. Since it takes a few minutes for a person to label a single image, and AI systems require hundreds of thousands of images to understand what they’re looking at, the field soon hit a roadblock.
Rajpurkar’s new approach skips the human labelers altogether by comparing a set of medical images—taken from any number of private or public data sets—with the radiology reports that almost always accompany them. The system can automatically match the images to issues the reports identify in writing. This means that CheXzero can use massive databases to learn to spot potential problems without human input to prepare the data first—a technique known as “self-supervision.”
Rajpurkar, who is an assistant professor of biomedical informatics at Harvard Medical School, says his dream is to eventually build a system capable of ingesting a patient’s medical records and then identifying problems doctors may have missed.
When a new generative AI model is released, the chatbot or image generator, or the underlying model’s capabilities, get far more attention than any details such as how and to whom the model is released—whether or not it’s open-sourced or licensed for commercial use, for example. But such decisions are deeply consequential.
More openness, for example, provides more opportunity to audit and evaluate the models—but also for bad actors to take advantage of them. More closed systems may concentrate power but limit the potential for harm.
In 2019, Irene Solaiman, then a researcher and public policy manager at OpenAI, led a new approach to the release of GPT-2, a predecessor to ChatGPT, that considered how to balance certain safeguards so as to minimize harm while increasing openness. Solaiman recommended releasing new models in phases, allowing more time to test them and build in guardrails. OpenAI, Microsoft, and Meta are now using this approach for ChatGPT, the new Bing search, and LLaMA, respectively.
Solaiman, 28, has since left OpenAI as is now at AI startup Hugging Face, where she serves as global public policy director. She continues her work to build clear, standardized processes for how future AI models are released. And she’s continuing her work on other aspects of AI safety as well, including developing ways to ensure that a community’s cultural values are taken into account before new systems are deployed there.
What ultimately motivates her, she says, is a desire to make sure that generative AI works well not only for its developers, but also for “people who aren’t interfacing with generative AI systems, but likely will be affected by AI.” In other words, everyone.
Richard Zhang, 34, a senior research scientist at Adobe, invented the visual similarity algorithms underlying image-generating AI models like Stable Diffusion and Stylegan.
Zhang began exploring generative AI while completing his PhD at UC Berkeley, where he created a widely used algorithm to colorize black-and-white photos. (This work turned into the Colorize tool in Adobe Photoshop.)
In doing this work, Zhang realized there was no “good objective metric” to train the AI system. “It’s really hard to write a map [of] what makes an image look good to a person,” he says, whether that means realistic colors or image clarity.
Most algorithms use mathematical models to measure how similar different images look to human viewers, but human perception is complex and not easily captured by a math problem. So Zhang built something better: LPIPS, his most influential project to date.
LPIPS is unique in incorporating big data sets of human perceptual judgments into its computations. This has helped it outperform all previous models, many of which had been in use for decades, and become the new standard for perceptual similarity. Without LPIPS,today’s image-generation AI would not be possible.
Since he joined Adobe in 2018, Zhang’s research has been incorporated into commercial software tools, including Photoshop’s landscape mixer and smart portrait features. Zhang has also worked on algorithms that help people detect images generated by AI, which are now part of Adobe Stock’s forensic tools.