An AI can simulate an economy millions of times to create fairer tax policy

Deep reinforcement learning has trained AIs to beat humans at complex games like Go and StarCraft. Could it also do a better job at running the economy?

Will Douglas Heavenarchive page

May 5, 2020

Tony Webster / Flickr

Income inequality is one of the overarching problems of economics. One of the most effective tools policymakers have to address it is taxation: governments collect money from people according to what they earn and redistribute it either directly, via welfare schemes, or indirectly, by using it to pay for public projects. But though more taxation can lead to greater equality, taxing people too much can discourage them from working or motivate them to find ways to avoid paying—which reduces the overall pot.

Getting the balance right is not easy. Economists typically rely on assumptions that are hard to validate. People’s economic behavior is complex, and gathering data about it is hard. Decades of economic research has wrestled with designing the best tax policy, but it remains an open problem.

Scientists at the US business technology company Salesforce think AI can help. Led by Richard Socher, the team has developed a system called the AI Economist that uses reinforcement learning—the same sort of technique behind DeepMind’s AlphaGo and AlpahZero—to identify optimal tax policies for a simulated economy. The tool is still relatively simple (there’s no way it could include all the complexities of the real world or human behavior), but it is a promising first step toward evaluating policies in an entirely new way. “It would be amazing to make tax policy less political and more data driven,” says team member Alex Trott.

In one early result, the AI found a policy that—in terms of maximizing both productivity and income equality—was 16% fairer than a state-of-the-art progressive tax framework studied by academic economists. The improvement over current US policy was even greater. “I think it's a totally interesting idea,” says Blake LeBaron at Brandeis University in Massachusetts, who has used neural networks to model financial markets.

In the simulation, four AI workers are each controlled by their own reinforcement-learning models. They interact with a two-dimensional world, gathering wood and stone and either trading these resources with others or using them to build houses, which earns them money. The workers have different levels of skill, which leads them to specialize. Lower-skilled workers learn they do better if they gather resources, and higher-skilled ones learn they do better if they buy resources to build houses. At the end of each simulated year, all workers are taxed at a rate devised by an AI-controlled policymaker, which is running its own reinforcement-learning algorithm. The policymaker’s goal is to boost both the productivity and the income of all workers. The AIs converge on optimal behavior by repeating the simulation millions of times.

Both reinforcement-learning models start from scratch, with no prior knowledge of economic theory, and learn how to act through trial and error—in much the same way that DeepMind’s AIs learn, with no human input, to play Go and chess at superhuman levels.

Can you learn much from only four AI workers? In theory, yes, because simple interactions between a handful of agents soon lead to very complex behaviors. (For all its complexity, Go still involves only two players, for example.) Even so, everyone involved in the project agrees that increasing the number of workers in the simulation will be essential if the tool is to model realistic scenarios.

Gaming the system

The double dose of AI is key. Neural networks have been used to control agents in simulated economies before. But making the policymaker an AI as well leads to a model in which the workers and policymaker continually adapt to each other’s actions. This dynamic environment was a challenge for the reinforcement-learning models, since a strategy learned under one tax policy may not work so well under another. But it also meant the AIs found ways to game the system. For example, some workers learned to avoid tax by reducing their productivity to qualify for a lower tax bracket and then increasing it again. The Salesforce team says this give-and-take between workers and policymaker leads to a simulation more realistic than anything achieved by previous models, where tax policies are typically fixed.

The tax policy that the AI Economist came up with is a little unusual. Unlike most existing policies, which are either progressive (that is, higher earners are taxed more) or regressive (higher earners are taxed less), the AI’s policy cobbled together aspects of both, applying the highest tax rates to rich and poor and the lowest to middle-income workers. Like many solutions that AIs come up with—such as some of AlphaZero’s game-winning moves—the result appears counterintuitive and not something that a human might have devised. But its impact on the economy led to a smaller gap between rich and poor.

To see if the AI-generated tax policy would influence human behavior in a similar way, the team tested it on more than 100 crowdworkers hired through Amazon’s Mechanical Turk, who were asked to take control of the workers in the simulation. They found that the policy encouraged the humans to play in much the same way as the AIs, suggesting—at least in principle—that the AI Economist could be used to influence real economic activity.

Endless tweaking

Another advantage of an AI-powered simulation is that you can tweak parameters to explore different scenarios. For example, it would be possible to model the impact of a pandemic by adding constraints such as social distancing and restricted access to resources, or by removing people from the workforce. “It's hard to come up with optimal tax theories based on the past if the future looks very different,” says Socher.

The ability of the simulation to model change is a big plus, says LeBaron: “It’s pretty interesting to see the workers adjusting themselves to the tax code.” This gets around one of the big criticisms of existing tax models in which behavior is typically fixed, he says.

LeBaron’s main reservation is the small number of agents the tool is limited to so far. “There are people who argue you can get deep intellectual insights with just a few agents,” he says. “I'm not one of them.” He would like to see it simulate around 100 workers—which is also a figure the Salesforce team is aiming for.

But LeBaron believes the tool could already be used to sanity-check existing economic models: “If I were a policymaker, I would fire this thing up to see what it says.” If the AI Economist disagreed with other models, then it could be a sign those other models were missing something, he says.

David Parkes, a computer scientist and economist at Harvard University who collaborated with the Salesforce team, is also optimistic. He agrees they need to increase the number of agents significantly. But once they have done that and added a few extra features such as companies to the simulation, he anticipates being able to replicate existing theoretical results. “Then it immediately becomes useful,” he says.

Doyne Farmer, an economist at the University of Oxford, is less convinced, however. Though he welcomes the crossover of reinforcement learning from games to economics—“It gets at the question of whether you can investigate policies in the same way that AlphaZero plays Go”—he thinks it will be some time before the tool is actually useful. “The real world is way too complicated,” he says.

The team accepts that some economists will need persuading. To that end, they are releasing their code and inviting others to run their own models through it. In the long run, this openness will also be an important part of making such tools trustworthy, says Socher. “If you are using an AI to recommend that certain people get lower or higher taxes,” he points out, ‘you’d better be able to say why.”

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.