And yet the Photo Tourism application had an uncertain future. Though it was a technical revelation, developed in Linux and able to run on Windows, it was still very much a prototype, and the road map for developing it further was unclear.
In the spring of 2006, as Snavely was presenting Photo Tourism at an internal Microsoft workshop, Blaise Agüera y Arcas, then a new employee, walked by and took notice. He had arrived recently thanks to the acquisition of his company, Seadragon, which developed a software application he describes as “a 3-D virtual memory manager for images.” Seadragon’s eye-popping appeal lay in its ability to let users load, browse, and manipulate unprecedented quantities of visual information, and its great technical achievement was its ability to do so over a network. (Photosynth’s ability to work with images from Flickr and the like, however, comes from technology that originated with Photo Tourism.)
Agüera y Arcas and Snavely began talking that day. By the summer of 2006, demos were being presented. The resulting hybrid product–part Photo Tourism and part Seadragon–aggregates a large cluster of like images (whether photos or illustrations), weaving them into a 3-D visual model of their real-world subject. It even lends three-dimensionality to areas where the 2-D photos come together. Each individual image is reproduced with perfect fidelity, but in the transitions between them, Photosynth fills in the perceptual gaps that would otherwise prevent a collection of photos from feeling like part of a broader-perspective image. And besides being a visual analogue of a real-life scene, the “synthed” model is fully navigable. As Snavely explains,”The dominant mode of navigation is choosing the next photo to visit, by clicking on controls, and the system automatically moving the viewpoint in 3-D to that new location. A roving eye is a good metaphor for this.” The software re-creates the photographed subject as a place to be appreciated from every documented angle.
Photosynth’s startling technical achievement is like pulling a rabbit from a hat: it produces a lifelike 3-D interface from the 2-D medium of photography. “This is something out of nothing,” says Alexei A. Efros, a Carnegie Mellon professor who specializes in computer vision. The secret, Efros explains, is the quantity of photographs. “As you get more and more visual data, the quantity becomes quality,” he says. “And as you get amazing amounts of data, it starts to tell you things you didn’t know before.” Thanks to improved pattern recognition, indexing, and metadata, machines can infer three-dimensionality. Sooner than we expect, Efros says, “vision will be the primary sensor for machines, just as it is now for humans.”
Microsoft is demonstrating Photosynth online with photo collections such as this one of Venice’s St. Mark’s Square. The shots in this collection were taken by a single photographer over 10 days.
Credit: Courtesy of Microsoft Live Labs