Skip to Content
Artificial intelligence

Powerful computer vision algorithms are now small enough to run on your phone

October 11, 2019
An image of hand gestures being recognized on a mobile phone
An image of hand gestures being recognized on a mobile phoneHand illustrations: Noun Project / Ms. Tech

Researchers have shrunk state-of-the-art computer vision models to run on low-power devices.

Growing pains: Visual recognition is deep learning’s strongest skill. Computer vision algorithms are analyzing medical images, enabling self-driving cars, and powering face recognition. But training models to recognize actions in videos has grown increasingly expensive. This has fueled concerns about the technology’s carbon footprint and its increasing inaccessibility in low-resource environments.

The research: Researchers at the MIT-IBM Watson AI Lab have now developed a new technique for training video recognition models on a phone or other device with very limited processing capacity. Typically, an algorithm will process video by splitting it up into image frames and running recognition algorithms on each of them. It then pieces together the actions shown in the video by seeing how the objects change over subsequent frames. The method requires the algorithm to “remember” what it has seen in each frame and the order in which it has seen it. This is unnecessarily inefficient.

In the new approach, the algorithm instead extracts basic sketches of the objects in each frame, and overlays them on top of one another. Rather than remember what happened when, the algorithm can get an impression of the passing of time by looking at how the objects shift through space in the sketches. In testing, the researchers found that the new approach trained video recognition models three times faster than the state of the art. It was also able to quickly classify hand gestures with a small computer and camera running only on enough energy to power a bike light.

Why it matters: The new technique could help reduce lag and computation costs in existing commercial applications of computer vision. It could, for example, make self-driving cars safer by speeding up their reaction to incoming visual information. The technique could also unlock new applications that previously weren’t possible, such as by enabling phones to help diagnose patients or analyze medical images.

Distributed AI: As more and more AI research gets translated into applications, the need for tinier models will increase. The MIT-IBM paper is part of a growing trend to shrink state-of-the-art models to a more manageable size.

To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.

Deep Dive

Artificial intelligence

What does GPT-3 “know” about me? 

Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?

An AI that can design new proteins could help unlock new cures and materials 

The machine-learning tool could help researchers discover entirely new proteins not yet known to science.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.