DeepMind thinks that we imagine the future so well because part of our brain creates efficient summaries of how the future could play out.
For all of the recent advances in AI, machines still struggle to effectively plan in situations where even a few procedural steps cause huge explosions in complexity. We’ve seen that in AI’s struggle to master, say, the computer game Starcraft. In contrast, humans are pretty good at it: chances are you can quickly imagine how to handle a whole set of different scenarios for getting dinner if, say, the bodega is closed on your journey home from work.
Now, in a paper published in Nature Neuroscience, a team of researchers from Google’s AI division draws parallels between reinforcement learning—the field of machine learning where an AI learns to perform a task through trial and error by being rewarded when it does it correctly—and the brain’s hippocampus, to understand why humans have that edge.
While the hippocampus is usually thought to deal with a human’s current situation, DeepMind proposes that it actually makes predictions about the future, too. From a blog post describing the new work:
We argue that the hippocampus represents every situation—or state—in terms of the future states which it predicts. For example, if you are leaving work (your current state) your hippocampus might represent this by predicting that you will likely soon be on your commute, picking up your kids from school or, more distantly, at home. By representing each current state in terms of its anticipated successor states, the hippocampus conveys a compact summary of future events. We suggest that this specific form of predictive map allows the brain to adapt rapidly in environments with changing rewards, but without having to run expensive simulations of the future.
Of course, it’s not clear that this is the case. Nor is it clear that this alone is what makes humans good at planning. But DeepMind plans to try and work out if its new theory could help AIs to plan more efficiently by applying a mathematical implementation of the idea—where each future state can be assigned its own reward in order to calculate an optimal decision—inside neural networks. And if it works, the machines may just get a little bit better at thinking ahead.