Skip to Content

Photosynth for Video and Other TechFest Treats

More highlights from Microsoft’s annual research event in Redmond, WA.
February 25, 2009

Every year, Microsoft hosts an open house called TechFest to showcase some of its more flashy research projects. This year, the event in Redmond, WA boasted 37 demos, ranging from gesture-based interfaces to augmented reality and better image search. Below is a brief summary of some of the projects showcased on Tuesday that caught my eye.

1. Photosynth for Video

Three mobile video streams are combined into one panoramic video in real time using new software developed by Microsoft researchers. Credit: Microsoft Research

Based on the popularity of Photosynth it’s not surprising that Microsoft researchers are now trying to extend the technology to video. The regular software seamlessly stitches together pictures taken at a certain location, from different cameras, to create a zoomable, pannable, panoramic image. The new idea is that multiple people–eye-witnesses at a news event or fans at a music concert, for instance– record video of the event on their cell phones and stream it to a central server. Using the phones’ locations, and using image-recognition algorithms, software organizes and pieces together the mobile streams into a larger scene.

Ayman Kaheel, an engineer in Microsoft’s Cairo lab, demonstrated the software in the convention hall by holding up two cell phones, in camera mode, at different heights and pointed in slightly different directions. On his laptop, the two video feeds were merged in real time, to create a larger, more complete video. Impressive stuff.

2. Writing in the Air

One of the problems with video game systems, is that its hard for a player to enter text, to name a game character, or to chat with other players on a network for example. Since some of these systems are played on computers with Web cameras or infrared cameras (like the Wii), researchers at Microsoft’s research center in Beijing reasoned that hand-waving gestures could replace the traditional and clunky text input.

The researchers wrote software that tracks the movement of a colorful object, such as an apple or a ball, in a user’s hand and interprets, based on the path of the object, the character that the user outlines in the air. Hsiao-Wuen Hon, the director of Microsoft Research Asia says that the system works well for Chinese characters, and should be even better with English ones because there are far fewer of them.

3. A Color Palette for Better Image Search

Today’s image search engines do a decent job–up to a point. Search for a “tiger” and you’ll generally get a collection of orange, white, and black big cats. But, right now, it’s nearly impossible to tweak a search to find, say, a tiger on a white background, or a black and white tiger against blue sky.

Xian-Sheng Hua, a researcher at the Beijing center and a 2008 TR35 thinks he’s found a better way to home in on the right image. His search interface provides a color palette to the side of the results, and an gridded square that a user can fill with colors from the palette to winnow the search. For instance, if you’d like to search for a tiger with a blue sky, simply fill a few grids at the top of the square with blue and search for tiger again. This color-based filtering system eliminates the need to use extra metadata or tags describing the scene. Hsiao-Wuen says that such an interface would be relatively simple to integrate into today’s search engines.

4. Surface Goes 3-D

Surface seems to be the darling of Microsoft Research. The multi-touch tabletop is the high-profile project that took the fast track from the lab to consumers. And now, Andy Wilson, one of the researchers who worked on Surface is directing his attention to a touch-interface in the sky.

At TechFest, Wilson demonstrated a projector-and-infrared camera system that produces images inside a dome, and can recognize gestures made by people’s hands. In the demo, images from Microsoft’s World Wide Telescope, which provides a virtual tour of the night sky, were projected onto the dome; a researcher panned and zoomed accross the stars with the wave of a hand and a pinch of the thumb. Wilson believes that his team could build the system inexpensively enough for it to be used in school planetariums, or anywhere where people want to interact with large, panoramic projections.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.