When you can’t attend a crucial meeting or presentation, is a videoconference almost as good as being there? Typically not, unless you’ve got expensive professional staff available to handle the video.
To tackle this problem, Microsoft Research has built prototype systems that use automated camera management and 360-degree panoramic camera technologies. Developers presented the results last week at the annual Association for Computing Machinery’s Special Interest Group on Computer/Human Interaction Conference, held in Seattle.
Those Pesky Humans
Digital video quality is soaring and the cost of digital storage media and digital video cameras continues to plummet. Still, a single camera operator still costs $500 or more for a single lecture, says Anoop Gupta, senior researcher in Microsoft Research’s (MSR) collaboration and multimedia systems group. Gupta bases his cost figures on MSR’s own experiences in making streaming versions of talks by visitors available over the corporate network, both live and for later playback.
Apart from cost, the presence of a cameraperson-usually an outsider-has a psychological impact that tends to change the dynamics of group meetings or lectures.
The researchers attacked both problems by designing a system that mimics the actions of human camera operators. They began by interviewing seven of Microsoft’s in-house video producers about the camera techniques they used and then evaluated those techniques to identify which ones could be duplicated by automation.
They designed a system that simultaneously edits a lecture while it is in process, making the video appropriate for both live and on-demand viewing. For instance, if an initial (or “establishing”) shot of the speaker is taken from one side, all subsequent shots should be taken from that side. Otherwise, if the speaker is moving, a shot from the opposite side would confuse the viewer by making it appear that the speaker has changed direction. Additionally, the camera should stay put unless the speaker moves out of an already established zone of movement.
These camera-management rules are programmed into the software. So are automatic editing rules, like “when a person in the audience asks a question, promptly show that person” and “don’t cut to a camera that is too dark.” The system uses three cameras-one to track the speaker, one for the audience and questioners, and a third to show presentation graphics.
Other labs, including Bellcore (now named Telcordia), have been pursuing similar goals in recent years, and there are some commercial systems on the market. The MSR approach adds some twists-most notably, while some systems use an electronic tag on the speaker to help them track the speaker’s movements, MSR’s prototype uses motion sensors to orient the camera while the audience camera uses microphone triangulation to spot a questioner.
Can viewers tell the difference? MSR researchers asked viewers to distinguish lectures shot and edited by human and automated operators, and a bare majority couldn’t, says Gupta. MSR itself has been using its prototype to webcast lectures on the corporate network.
Seeing the Full View
MSR has also been trying to find ways to make telemeetings seem more natural to viewers. Most teleconferenced meetings today use a single camera in a corner of the room. That produces a decidedly one-sided viewpoint for remote viewers. One solution is to put an “omnidirectional” camera in the center of the meeting, enabling the remote viewers/participants to see everyone at once.
Similar in concept to an omnidirectional microphone, which picks up audio in a 360-degree pattern, an omnidirectional camera picks up everything around it by placing a parabolic mirror directly beneath a camera that is pointed straight down. While this produces a “fish-eye” distortion of the room and everything in it, software can “unwrap” the 360-degree image using well-understood computer vision techniques.
Omnidirectional cameras are expensive-$10,000 or more-so the MSR researchers looked for less costly ways to do the same thing. They found they could come up with the same 360-degree panorama, and with better resolution, by using five inexpensive webcams set in a circle, together with software to correct distortion. That reduced the cost of the camera setup dramatically down to $300.
So far, Microsoft has not announced any plans for commercializing its meeting-room software and is unlikely to offer any of the accompanying hardware. However, the Redmond software giant has not been shy about turning over work done in its four research labs when it can. So one of these days, don’t be too surprised if some of these video technologies show up in your corporate conference room.
DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.
“This is a profound moment in the history of technology,” says Mustafa Suleyman.
What to know about this autumn’s covid vaccines
New variants will pose a challenge, but early signs suggest the shots will still boost antibody responses.
Human-plus-AI solutions mitigate security threats
With the right human oversight, emerging technologies like artificial intelligence can help keep business and customer data secure
Next slide, please: A brief history of the corporate presentation
From million-dollar slide shows to Steve Jobs’s introduction of the iPhone, a bit of show business never hurt plain old business.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.