Remaking the Meeting-Cam

Microsoft prototypes automatically track speakers and provide panoramic views of the conference room.

April 9, 2001

When you can’t attend a crucial meeting or presentation, is a videoconference almost as good as being there? Typically not, unless you’ve got expensive professional staff available to handle the video.

To tackle this problem, Microsoft Research has built prototype systems that use automated camera management and 360-degree panoramic camera technologies. Developers presented the results last week at the annual Association for Computing Machinery’s Special Interest Group on Computer/Human Interaction Conference, held in Seattle.

Those Pesky Humans

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Already a subscriber?

You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Already a subscriber?

Digital video quality is soaring and the cost of digital storage media and digital video cameras continues to plummet. Still, a single camera operator still costs $500 or more for a single lecture, says Anoop Gupta, senior researcher in Microsoft Research’s (MSR) collaboration and multimedia systems group. Gupta bases his cost figures on MSR’s own experiences in making streaming versions of talks by visitors available over the corporate network, both live and for later playback.

Apart from cost, the presence of a cameraperson-usually an outsider-has a psychological impact that tends to change the dynamics of group meetings or lectures.

The researchers attacked both problems by designing a system that mimics the actions of human camera operators. They began by interviewing seven of Microsoft’s in-house video producers about the camera techniques they used and then evaluated those techniques to identify which ones could be duplicated by automation.

They designed a system that simultaneously edits a lecture while it is in process, making the video appropriate for both live and on-demand viewing. For instance, if an initial (or “establishing”) shot of the speaker is taken from one side, all subsequent shots should be taken from that side. Otherwise, if the speaker is moving, a shot from the opposite side would confuse the viewer by making it appear that the speaker has changed direction. Additionally, the camera should stay put unless the speaker moves out of an already established zone of movement.

These camera-management rules are programmed into the software. So are automatic editing rules, like “when a person in the audience asks a question, promptly show that person” and “don’t cut to a camera that is too dark.” The system uses three cameras-one to track the speaker, one for the audience and questioners, and a third to show presentation graphics.

Other labs, including Bellcore (now named Telcordia), have been pursuing similar goals in recent years, and there are some commercial systems on the market. The MSR approach adds some twists-most notably, while some systems use an electronic tag on the speaker to help them track the speaker’s movements, MSR’s prototype uses motion sensors to orient the camera while the audience camera uses microphone triangulation to spot a questioner.

Can viewers tell the difference? MSR researchers asked viewers to distinguish lectures shot and edited by human and automated operators, and a bare majority couldn’t, says Gupta. MSR itself has been using its prototype to webcast lectures on the corporate network.

Seeing the Full View

MSR has also been trying to find ways to make telemeetings seem more natural to viewers. Most teleconferenced meetings today use a single camera in a corner of the room. That produces a decidedly one-sided viewpoint for remote viewers. One solution is to put an “omnidirectional” camera in the center of the meeting, enabling the remote viewers/participants to see everyone at once.

Similar in concept to an omnidirectional microphone, which picks up audio in a 360-degree pattern, an omnidirectional camera picks up everything around it by placing a parabolic mirror directly beneath a camera that is pointed straight down. While this produces a “fish-eye” distortion of the room and everything in it, software can “unwrap” the 360-degree image using well-understood computer vision techniques.

Omnidirectional cameras are expensive-$10,000 or more-so the MSR researchers looked for less costly ways to do the same thing. They found they could come up with the same 360-degree panorama, and with better resolution, by using five inexpensive webcams set in a circle, together with software to correct distortion. That reduced the cost of the camera setup dramatically down to $300.

So far, Microsoft has not announced any plans for commercializing its meeting-room software and is unlikely to offer any of the accompanying hardware. However, the Redmond software giant has not been shy about turning over work done in its four research labs when it can. So one of these days, don’t be too surprised if some of these video technologies show up in your corporate conference room.

This is your last free story.

Our most popular stories