At last March’s Technology, Entertainment, Design (TED) conference in Monterey, CA, a summit that’s been described as “Davos for the digerati,” the calm-voiced software architect from Microsoft began his demonstration abruptly, navigating rapidly across a sea of images displayed on a large screen. Using Seadragon, a technology that enables smooth, speedy exploration of large sets of text and image data, he dove effortlessly into a 300-megapixel map, zooming in to reveal a date stamp from the Library of Congress in one corner. Then he turned to an image that looked like a bar code but was actually the complete text of Charles Dickens’s Bleak House, zooming in until two crisp-edged typeset characters filled the screen, before breezily reverse-zooming back to the giant quilt of text and images.
Microsoft had acquired Seadragon the previous year–and with it the presenter, Blaise Agüera y Arcas. But Agüera y Arcas had not come to TED just to show off Seadragon. Soon he cut to a panorama tiled together from photos of the Canadian Rockies; the mosaic shifted as he panned across it, revealing a dramatic ridgeline. Next came an aerial view of what appeared to be a model of a familiar building: Notre Dame Cathedral. The model, Agüera y Arcas explained, had been assembled from hundreds of separate images gathered from Flickr. It was a “point cloud”–a set of points in three-dimensional space.
As he talked, Agüera y Arcas navigated teasingly around the periphery of Notre Dame, which repeatedly came alive and dimmed again. The effect of hurtling through shifting images and focal points was softened by subtle transitional effects. It felt like a deliberately slowed reel of frame-by-frame animation; the effect was jolting. The crowd watched in wonder as Agüera y Arcas pushed deeper into the front view of the building’s archway, ending with a tight close-up of a gargoyle. Some of the images the technology had drawn on were not even strictly photographic: it had searched Flickr for all relevant images, including a poster of the cathedral. What Agüera y Arcas was demonstrating wasn’t video, but neither was it merely a collection of photos, even a gargantuan one. It was also like a map, but an immersive one animated by the dream logic of blurring shapes and shifting perspectives.
This was Photosynth–a technology that analyzes related images and links them together to re-create physical environments in a dazzling virtual space. The technology creates a “metaverse,” Agüera y Arcas said (for more on the nascent blending of mapping technologies like Google Earth and the fantastic realms of games like Second Life, see “Second Earth,” July/August 2007); but it also constitutes the “long tail” of Virtual Earth, Microsoft’s competitor to Google Earth, because of its ability to draw from and contribute to the wealth of local mapping and image data available online. It could provide “immensely rich virtual models of every interesting part of the earth,” he said, “collected not just from overhead flights and from satellite images and so on, but from the collective memory.” At which point the presentation ended as abruptly as it had begun some six minutes earlier. Agüera y Arcas’s concluding statement met with a thunder of applause.
Beyond Image Stitching
Photosynth was born from what Agüera y Arcas calls the marriage of Seadragon and Photo Tourism, a Microsoft project intended to revolutionize the way photo sets are packaged and displayed. Photo Tourism had begun as the doctoral thesis of a zealous 26-year-old University of Washington graduate student named Noah Snavely. One of Snavely’s advisors was Rick Szeliski, a computer-vision researcher at Microsoft Research, the company’s R&D arm. “I described the need for the good elements of a strong slide show, like great composition,” recalls Szeliski, whose earlier work at Microsoft had helped develop the image-stitching technology now commonly used in digital cameras to fill a wider or taller frame. He also sought fluidity between images and a sense of interactivity in viewing them.