In fact, parallax isn’t strictly required for 3-D vision: if you shut one eye, the world doesn’t go flat. The brain infers depth using all sorts of cues such as shading, color, motion, and our learned experience about the spatial relationships between floors and walls, or between streets and buildings. “It turns out that using a fairly simple model–thinking of the world in terms of a ground surface, vertical surfaces that stick up out of it, and the sky–you can create pretty compelling 3-D models,” says Hoiem.
The software that he, Efros, and Hebert developed starts converting an image by trying to group each pixel in a two-dimensional image into one of these classes. Sky is usually the easiest–it’s blue or white. The top and bottom edges of most photos are aligned with the horizon, which helps the software identify the ground plane. And the windows of a multistory building are often arranged in parallel lines with a common vanishing point–a strong indication of a vertical surface.
But Hoiem didn’t explicitly teach the software these rules. The system is based on machine-learning algorithms, meaning that it figures out its own rules of thumb by recognizing statistical patterns in hundreds of images in which the ground, sky, and vertical surfaces have been prelabeled by humans.
“We didn’t have to start completely from scratch, fortunately,” says Hoiem. “There’s been a lot of work on how we represent color and texture and structure. There is an existing algorithm for recognizing the vanishing point of a group of lines. And people have worked a lot on recognizing objects like people or cars. But nobody had thought that maybe you can combine all of these and learn to recognize the actual geometry of a scene.”
Once Fotowoosh has identified the major surfaces in a scene, it joins them into a 3-D model using the Virtual Reality Markup Language file format, or VRML. The software peels off parts of the two-dimensional image and pastes them onto the appropriate surfaces in the model, a process called texture mapping.
Currently, the finished models can only be viewed inside a Web browser equipped with a special extension for viewing VRML files. But in the beta version of Fotowoosh, due next month, the models will be displayed using the more common Flash format already included in most browsers, according to Pishevar. (The Fotowoosh home page includes a video demonstrating the end product for several sample images.)
Right now, the system isn’t very good at separating discrete objects that should be in the foreground, such as pedestrians in a street scene, from background surfaces, such as walls. But Hoiem is working on that. “In a year or possibly less, you’ll be able to take a photo of an alley with all sorts of cars and people, and create a 3-D model where those are all seen as separate 3-D foreground objects,” he says.
Gain the insight you need on robotics at EmTech MIT.