Skip to Content
Artificial intelligence

Facebook’s AI tourist finds its way around New York City by asking for help from another algorithm

AI algorithms can learn to navigate in the real world using language—and that might help make chatbots and voice assistants smarter.
July 12, 2018

If you get lost in New York without a smartphone or a map, you’ll most likely ask a local for directions. Facebook’s researchers are training AI programs to do the same thing, and they’re hoping this could eventually make them far better at using language.

The Facebook Artificial Intelligence Research (FAIR) group in New York created two AI programs: a “tourist” effectively lost in the Big Apple, and a “guide” designed to help its fellow algorithm find its way around by offering natural-language instructions. The lost tourist sees photos of the real world, while the “guide” sees a 2-D map with landmarks. Together they are tasked with reaching a specific destination.

The idea is that by learning how instructions relate to real objects like a “restaurant” or a “hotel,” just as a baby learns by associating words with real objects and actions, the tourist algorithm will start to figure out what these things actually are—or at least how they fit into a simple street view of the world. AI researchers hope that algorithms taught this way will be more sophisticated in their use of language.

Language remains a huge challenge for artificial intelligence. It’s easy to build algorithms capable of answering simple commands or even holding a rudimentary conversation, but complex dialogue is impossible for a machine. This is partly because decoding ambiguity in language requires some common-sense knowledge of the real world. Giving an algorithm simple rules or training it on large amounts of text often results in absurd misunderstandings (see “AI’s language problem”).

“One strategy for eventually building AI with human-level language understanding is to train those systems in a more natural way, by tying language to specific environments,” the researchers write in a related blog post. “Just as babies first learn to name what they can see and touch, this approach—sometimes referred to as ‘embodied AI’—favors learning in the context of a system’s surroundings, rather than training through large data sets of text.”

The Facebook research is an attempt to give AI algorithms some common sense by grounding their understanding of language in a simplified representation of the real world.

The idea of “embodied AI” has been around for some time, but most efforts to date have relied on simulated environments rather than actual images. Greater realism makes things more challenging, but it will be crucial if AI algorithms are to become more useful (see “Facebook helped create an AI scavenger hunt”).

The researchers used a 360° camera to capture New York City neighborhoods including Hell’s Kitchen, the Financial District, the Upper East Side, and Williamsburg.

They also ran experiments where the algorithms could experiment with their own protocols or language. Interestingly, the researchers found that things worked best when the algorithms were allowed to do this.

The Facebook researchers are releasing the code behind their project, called Walk the Talk, in hopes that other AI scientists will use it to further research on embodied AI and language algorithms.

Deep Dive

Artificial intelligence

conceptual illustration showing various women's faces being scanned
conceptual illustration showing various women's faces being scanned

A horrifying new AI app swaps women into porn videos with a click

Deepfake researchers have long feared the day this would arrive.

Conceptual illustration of a therapy session
Conceptual illustration of a therapy session

The therapists using AI to make therapy better

Researchers are learning more about how therapy works by examining the language therapists use with clients. It could lead to more people getting better, and staying better.

a Chichuahua standing on a Great Dane
a Chichuahua standing on a Great Dane

DeepMind says its new language model can beat others 25 times its size

RETRO uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network

THE BLOB, 1958, promotional artwork
THE BLOB, 1958, promotional artwork

2021 was the year of monster AI models

GPT-3, OpenAI’s program to mimic human language,  kicked off a new trend in artificial intelligence for bigger and bigger models. How large will they get, and at what cost?

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.