Worried about your firm’s AI ethics? These startups are here to help.

A growing ecosystem of “responsible AI” ventures promise to help organizations monitor and fix their AI models.

Karen Haoarchive page

January 15, 2021

Getty

Rumman Chowdhury’s job used to involve a lot of translation. As the “responsible AI” lead at the consulting firm Accenture, she would work with clients struggling to understand their AI models. How did they know if the models were doing what they were supposed to? The confusion often came about partly because the company’s data scientists, lawyers, and executives seemed to be speaking different languages. Her team would act as the go-between so that all parties could get on the same page. It was inefficient, to say the least: auditing a single model could take months.

So in late 2020, Chowdhury left her post to start her own venture. Called Parity AI, it offers clients a set of tools that seek to shrink the process down to a few weeks. It first helps them identify how they want to audit their model—is it for bias or for legal compliance?—and then provides recommendations for tackling the issue.

Parity is among a growing crop of startups promising organizations ways to develop, monitor, and fix their AI models. They offer a range of products and services from bias-mitigation tools to explainability platforms. Initially most of their clients came from heavily regulated industries like finance and health care. But increased research and media attention on issues of bias, privacy, and transparency have shifted the focus of the conversation. New clients are often simply worried about being responsible, while others want to “future proof” themselves in anticipation of regulation.

“So many companies are really facing this for the first time,” Chowdhury says. “Almost all of them are actually asking for some help.”

From risk to impact

When working with new clients, Chowdhury avoids using the term “responsibility.” The word is too squishy and ill-defined; it leaves too much room for miscommunication. She instead begins with more familiar corporate lingo: the idea of risk. Many companies have risk and compliance arms, and established processes for risk mitigation.

AI risk mitigation is no different. A company should start by considering the different things it worries about. These can include legal risk, the possibility of breaking the law; organizational risk, the possibility of losing employees; or reputational risk, the possibility of suffering a PR disaster. From there, it can work backwards to decide how to audit its AI systems. A finance company, operating under the fair lending laws in the US, would want to check its lending models for bias to mitigate legal risk. A telehealth company, whose systems train on sensitive medical data, might perform privacy audits to mitigate reputational risk.

A screenshot of Parity's library of impact assessment questions. — Parity includes a library of suggested questions to help companies evaluate the risk of their AI models.

Parity helps to organize this process. The platform first asks a company to build an internal impact assessment—in essence, a set of open-ended survey questions about how its business and AI systems operate. It can choose to write custom questions or select them from Parity’s library, which has more than 1,000 prompts adapted from AI ethics guidelines and relevant legislation from around the world. Once the assessment is built, employees across the company are encouraged to fill it out based on their job function and knowledge. The platform then runs their free-text responses through a natural-language processing model and translates them with an eye toward the company’s key areas of risk. Parity, in other words, serves as the new go-between in getting data scientists and lawyers on the same page.

Next, the platform recommends a corresponding set of risk mitigation actions. These could include creating a dashboard to continuously monitor a model’s accuracy, or implementing new documentation procedures to track how a model was trained and fine-tuned at each stage of its development. It also offers a collection of open-source frameworks and tools that might help, like IBM’s AI Fairness 360 for bias monitoring or Google’s Model Cards for documentation.

Chowdhury hopes that if companies can reduce the time it takes to audit their models, they will become more disciplined about doing it regularly and often. Over time, she hopes, this could also open them to thinking beyond risk mitigation. “My sneaky goal is actually to get more companies thinking about impact and not just risk,” she says. “Risk is the language people understand today, and it’s a very valuable language, but risk is often reactive and responsive. Impact is more proactive, and that’s actually the better way to frame what it is that we should be doing.”

A responsibility ecosystem

While Parity focuses on risk management, another startup, Fiddler, focuses on explainability. CEO Krishna Gade began thinking about the need for more transparency in how AI models make decisions while serving as the engineering manager of Facebook’s News Feed team. After the 2016 presidential election, the company made a big internal push to better understand how its algorithms were ranking content. Gade’s team developed an internal tool that later became the basis of the “Why am I seeing this?” feature.

Gade launched Fiddler shortly after that, in October 2018. It helps data science teams track their models’ evolving performance, and creates high-level reports for business executives based on the results. If a model’s accuracy deteriorates over time, or it shows biased behaviors, Fiddler helps debug why that might be happening. Gade sees monitoring models and improving explainability as the first steps to developing and deploying AI more intentionally.

Arthur, founded in 2019, and Weights & Biases, founded in 2017, are two more companies that offer monitoring platforms. Like Fiddler, Arthur emphasizes explainability and bias mitigation, while Weights & Biases tracks machine-learning experiments to improve research reproducibility. All three companies have observed a gradual shift in companies’ top concerns, from legal compliance or model performance to ethics and responsibility.

“The cynical part of me was worried at the beginning that we would see customers come in and think that they could just check a box by associating their brand with someone else doing responsible AI,” says Liz O’Sullivan, Arthur’s VP of responsible AI, who also serves as the technology director of the Surveillance Technology Oversight Project, an activist organization. But many of Arthur’s clients have sought to think beyond just technical fixes to their governance structures and approaches to inclusive design. “It’s been so exciting to see that they really are invested in doing the right thing,” she says.

O’Sullivan and Chowdhury are also both excited to see more startups like theirs coming online. “There isn’t just one tool or one thing that you need to be doing to do responsible AI,” O’Sullivan says. Chowdury agrees: “It’s going to be an ecosystem.”

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.