Augmented Reality Meets Gesture Recognition

A new app superimposes imagery over your smart-phone view, and lets you interact with it via hand gestures.

Tom Simonitearchive page

September 15, 2011

To make its business software more effective, HP recently paid $10 billion for Autonomy, a U.K. software company that specializes in machine learning. But it turns out that Autonomy has developed image-processing techniques for gesture-recognizing augmented reality—the type of technology that could be more attractive to consumers than IT managers.

**Enhancing reality:** The Aurasma app overlays interactive content on the real world, such as a page in a magazine. The app can recognize gestures, too, letting a user interact with virtual objects.

Augmented reality involves layering computer-generated imagery on top of a view of the real world as seen through the camera of a smart phone or tablet computer. So someone looking at a city scene through a device could see tourist information on top of the view.

Autonomy’s new augmented reality technology, called Aurasma, goes a step further: it recognizes a user’s hand gestures. This means a person using the app can reach out in front of the device to interact with the virtual content. Previously, interacting with augmented reality content involved tapping the screen. One demonstration released by Autonomy creates a virtual air hockey game on top of an empty tabletop—users play by waving their hands.

Autonomy’s core technology lets businesses index and search data that conventional, text-based search engines struggle with. Examples are audio recordings of sales calls, or video from surveillance cameras. “We use the same core technology in Aurasma to identify images or scenes and retrieve the relevant content to put on top,” says Aurasma director Matt Mills, who presented the app at the DEMO technology conference in Santa Clara, California, this week.

Autonomy quietly launched Aurasma in May, and GQ magazine has already used it to make some of its pages interactive. But the company announced only recently that Aurasma can track and respond to gestures to make virtual objects interactive. “We’ve now added finger recognition,” says Mills, “so you get an experience a bit like using the Kinect. You reach out your hand and the content responds.”

The Aurasma app, available for iPhone, iPad, and Android smart phones, constantly creates a visual “fingerprint” of what’s in front of it, and compares it to a set of fingerprints for the area where the app is being used. When it identifies a scene, perhaps a photo on a billboard, the Statue of Liberty, or a house on your street, interactive video or imagery is overlaid on top of the view. Users can also create their own content by assigning a photo or video to a particular real-world scene. The virtual content is carefully lined up with the visual features it was programmed for. This means a massive dinosaur can rear its head behind the Golden Gate Bridge, as seen in this video.

Aurasma’s closest competitor is Layar, a Netherlands company that offers an augmented-reality platform that others can add content to. However, Layar has so far largely relied on GPS location to position content, and only recently made it possible to position virtual objects more precisely, using image recognition. And Layar does not recognize users’ gestures.

Mills says that Aurasma’s ability to track objects precisely means it can be used for more than just advertising. In another demonstration, a smart phone running the app, when pointed at the back of a broadband router, revealed graphics and text explaining what each port was for.

Although mobile phones and tablets are the best interfaces available for augmented reality today, the experience is still somewhat clunky, since a person must hold up a device with one hand at all times. Sci-fi writers and technologists have long forecast that the technology would eventually be delivered through glasses. Recognizing hand movements would be useful for such a design, since there wouldn’t be the option of using a touch screen or physical buttons.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.