Merging Video with Maps

A new system uses panoramic images to create navigation videos that highlight turns and landmarks.

Rachel Kremenarchive page

October 14, 2009

A novel navigation system under development at Microsoft aims to tweak users’ visual memory with carefully chosen video clips of a route. Developed with researchers from the University of Konstanz in Germany, the software creates video using 360-degree panoramic images of the street that are strung together. Such images have already been gathered by several different mapping companies for many roads around the world. The navigation system, called Videomap, adjusts the speed of the video and the picture to highlight key areas along the route.

**Drive-through:** New navigation software uses panoramic images to create a video preview of the route. As the video plays, users can also follow the route on the map (shown here in green and on the next page, larger image).

“What we wanted to do is build a system where we could give [drivers] those visual cues before they got into the car,” says Billy Chen, a researcher in the MSN Advanced Engineering group. Ideally, he says, the driver would feel as if she’s driven the route before, even if she’s never been on those streets.

Videomap still provides written directions and a map with a highlighted route. But unlike existing software, such as Google Maps or MapQuest, the system also allows users to watch a video of their drive. The video slows down to highlight turns or speeds up to minimize the total length of the clip. Memorable landmarks are also highlighted, though at present the researchers have to select them from the video manually.

“As we pass a landmark, the field of view will expand to encompass that landmark and create a landmark thumbnail [image],” Chen says. The video freezes on this image for a few seconds to imprint it in the driver’s memory, so that she will recognize it during the drive.

Algorithms also automatically adjust the video to incorporate something Chen calls “turn anticipation.” Before a right-hand turn, for example, the video will slow down and focus on images on the right-hand side of the street. This smoothes out the video and draws the driver’s attention to the turn. Still images of the street at each turn are also embedded in the map and the written directions.

The system was tested on 20 users, using images of streets in Austria. The participants were given driving directions using the standard map and text, as well as thumbnails for each intersection. Each participant was allotted five minutes to study the information. The drivers were then shown a video simulation of the drive and asked which way the car should turn at various points along the way. They were then asked to do the same thing for a different route, this time using Videomap directions.

When given Videomap directions, drivers made the correct turn 80 percent of the time. With a map and text directions, the drivers made the correct turn only 60 percent of the time. “The results are pretty conclusive,” Chen says. He adds that the drivers also didn’t have to look at their printed material as often after watching Videomap directions. Furthermore, the majority of users preferred Videomap.

While Chen was happy with the results, he would like to perform the tests again. The initial tests used the same video for both the Videomap and the simulation, although the simulation video was not sped up or enhanced in any way. Chen would like to see how much the visual cues help when the season or lighting is different in the simulation.

“We also want to see if we can improve [the] interface itself,” he says. “The map is currently synchronized to the video, so that the map is moving when [the] video is playing.” This divides the user’s attention between the video and the map, Chen says. He hopes to find a way to reliably draw the user’s attention to the video when approaching a landmark, for example.

Arzu Coltekin, a senior researcher at the University of Zurich who works in the Geographic Information Visualization and Analysis Division, finds the work interesting. Some might say that a system such as Videomap isn’t necessary because of the proliferation of GPS receivers in cars, but Coltekin notes that it would still be useful for those who bike or walk, which “is quite common in Europe. And when you are walking or biking, often you don’t have a GPS.” But she says the team needs to come up with a way to automatically identify landmarks.

Chen says that Microsoft could use a list of landmarks that is already in its geospatial database, or such a list could perhaps be compiled by users.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.