There’s a new way to tame language AI so it doesn’t embarrass you

Models can now be steered to generate text based on the topic or sentiment of your choosing.

Karen Haoarchive page

December 18, 2019

An image of text output by AI model GPT-2.Ms. Tech

In the last two years, the AI subfield of natural-language proce ssing has seen enormous progress. For example, a language model developed by the San Francisco–based research lab OpenAI, called GPT-2, has been used to generate fiction, fake news articles, and a practically infinite Choose Your Own Adventure–style text game.

But these kinds of models are essentially massive text-prediction systems that don’t take sense into account, so the sentences they produce are more likely to be superficially fluent than they are to be truly meaningful. It’s hard to tell a model to stick to a particular topic like health care, for example. Yet models like GPT-2 can still be gamed to produce racist and toxic output, making them even less useful.

Now researchers at Uber AI have developed a way to steer these language models, making it easier for users to specify the topic or even the sentiment of the sentences they generate. Given the prompt “The issue focused on,” for example, a model told to focus on the military might produce an output like this: “The issue focused on the fact that the government had spent billions on the military and that it could not deploy the troops in time.” If it were instead told to focus on politics, the output might be more like this: “The issue focused on a single section of the legislation. It’s unclear whether the committee will vote to extend the law.”

While the model still doesn’t understand meaning, the technique brings more control. It takes us one step closer to bringing the leaps in AI-generated language to more domain-specific applications, like health care or financial services chatbots. It could also be used to guide models away from producing offensive results.

The technique uses two separate statistical models. The first is simply the original language model, like GPT-2, which constructs sentences based on the probabilities of certain words appearing next to others. The second model judges how well the first model’s output displays a desired attribute—whether it’s sticking to a prescribed topic or a particular sentiment, for example. If the desired attribute is a topic like space, the model might score the first model’s output on how many relevant words it contains, such as “planet,” “galaxy,” and “orbit.” If the attribute is a sentiment like positivity, the evaluation model could be trained to score the emotional content of its words.

When an initial prompt is fed into the first model, it begins the process of predicting subsequent words. But after every word, it checks its score with the evaluation model and readjusts on the basis of the feedback. The final sentence ends up with the desired attribute, while also retaining the giant language model’s fluency.

The new method is very flexible and can combine multiple goals. It could be directed to write about cooking with a negative tone, for example. It also has the benefit of being computationally efficient. Other methods can focus the output of a language model toward specific topics or emotions, but they can require significant retraining. At the scale of GPT-2, this is both environmentally and financially expensive. “A grad student like me doesn’t have those resources,” says Sumanth Dathathri, who studies at Caltech and coauthored the paper during an internship with Uber. The new method avoids retraining entirely by granting more control over whatever model already exists.

The team foresees this technique being used in many different applications, whether dialogue systems, translation systems, or even art. In 2016, the lab developed a similar method for controlling generation of images rather than language. “There were a lot of artists that used it to produce beautiful stuff,” recalls Jason Yosinski, a founding member of Uber AI who oversaw the paper. “I could see many other artists doing the same here.”

To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.