Picture This

Image-based rendering creates photorealistic 3-D models from plain old pictures.

Alan Leoarchive page

February 27, 2002

At first light on July 4, 1995, Gary Bishop and Leonard McMillan hit the grounds of the University of North Carolina for a most interesting computer imaging project. The computer science professor and his then-graduate student needed to capture plenoptic images-360-degree views-of the school’s bucolic landscape in Chapel Hill. The photos were vital to their research-an attempt to create photorealistic three-dimensional computer graphics-so they shot perfectly aligned images for more than an hour, an excruciatingly slow and exact process in the Carolina heat. “We got back to the lab,” Bishop remembers, “and found that a cable was disconnected. Not one picture made it to disk.” So the two hauled their gear back out in the middle of the day. The temperature, uncomfortable at 6 a.m., was at noon almost unbearable. But Bishop and McMillan knew they were onto something even hotter: an entirely new way to create 3-D graphics that matched the realism of the best photography.

Today, McMillan is professor of computer science at the Massachusetts Institute of Technology, and he and Bishop belong to a growing field of research called image-based rendering-creating 3-D computer images and animations from photographs. Recently, the technology has begun to move out of computer science labs and into the marketplace: it’s been showcased in the Super Bowl and such movies as The Matrix and Mission Impossible II.

Traditionally, 3-D animation has been a geometry problem. 3-D models are built around a system of geometric shapes, or polygons. The software stores the coordinates of each polygon, and applies algorithms to animate them. This works well when the 3-D models are simple, such as a cube. A single polygon can store information for hundreds or thousands of pixels, dramatically reducing the amount of digital data needed to represent the scene.

But as image rendering engines improve, and rendering gets more sophisticated, Bishop says, old-fashioned geometric modeling starts to make less sense. Today, the best 3-D engines render more polygons than screens have pixels. “When a triangle represents half a pixel, then what’s the point?” Bishop asks. “I have to start thinking about some other kind of representation.”

Bishop and McMillan began exploring an alternative approach that creates 3-D models by analyzing a group of 2-D photographs taken from slightly different perspectives. Bishop, with a graciousness to match his Carolina drawl, hands McMillan the lion’s share of the credit for designing the software that makes image-based rendering possible.

Back in 1995, they were determined to demonstrate their 3-D technique using pictures of the Chapel Hill campus because realistic outdoor scenes are some of the hardest to model using the older, geometric approach. But image-based rendering creates photorealistic outdoor views, making it ideal for applications such as virtual tourism, Bishop says. Today he envisions image-based 3-D views that allow virtual tourists into sites too remote or fragile for real visitors-such as the inside of the pyramids, the bottom of the sea or the surface of the moon.

Today, Bishop says, the biggest challenge is to combine image-based rendering with geometric modeling to create models called “imposters.” “Leonard took off originally in the direction of what could you do if you don’t have geometry,” Bishop says. “But the neatest work is in the area in between” geometric modeling and image-based rendering. Impostors combine the advantages of geometric models-certainty about an object’s shape from any angle-with the photorealistic detail of image-based rendering. Currently, McMillan is working on several projects that have grown out of image based rendering, including how to create geometric models-called visual hulls-from images.

Researchers around the world have taken up the standard. Millions watched the best-known example of image-based rendering, developed at Carnegie Mellon University, in last year’s Super Bowl: while a network of cameras filmed players from multiple angles, software rendered new images to fill in the angles in-between the cameras, producing a continuous 3-D view of the play in near-real time.

As the technology improves, Bishop predicts that such 3-D entertainment will become more common-and even put control of the camera angle in the hands of the viewer. But as for him, he’s happy to leave the director in charge. “The difference between my home movies and Citizen Kane,” he says, “is that the director controls where the camera is.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.