Ketan Banjara’s living room isn’t cluttered with remote controls. To shush the music, he simply holds a finger up to his lips. And when he gets up from the couch and leaves the room, his TV screen pauses automatically.
Banjara is a cofounder of PredictGaze, a startup that combines gaze detection, gesture recognition, and facial-feature recognition to create more natural ways to control everything from your TV to your car.
While many people are just getting their hands on their first touch-screen gadget, PredictGaze is one of a slew of companies betting that touch-free controls will be the next big thing. With front-facing cameras being embedded in all sorts of gadgets, it’s not hard to imagine.
So far, the company’s software has been integrated into an iOS game to enable users to control a food-guzzling, gap-toothed monster by moving their heads. The technology is also being tested in some stores in Japan where cameras track a customer’s gender, smile, and time spent in front of a product in hopes that the data can be used to increase sales by offering a hesitant shopper a coupon at the right time or by improving product placement.
PredictGaze imagines all kinds of applications for its technology, including preventing your five-year-old from watching adult content on TV and determining that you’re starting to fall asleep behind the wheel of your car.
The company’s technology combines machine-learning and computer-vision algorithms in software that can use a standard VGA camera to figure out where you’re looking, what your gender is, if you’re smiling, and more. It can also identify gestures—all of which can then be translated into actions taken by devices.
In order to identify a person’s age, for example, PredictGaze has trained its software with images of people of various ages. This information can then be used to identify how old you are from an image of you when you stand in front of a camera that’s connected to, say, an iPad running its software.
Similarly, the company trained its software with thousands of images of shushing and non-shushing actions in order to identify that action.
To do gaze tracking, the camera continuously captures your eye movements and uses that data to calculate where you are looking. That can be translated into, say, a character’s movement in a game.
“It’s not that we have different technologies that we brought together. That’s a difference. It’s one technology that is able to handle face, gesture, and gaze,” Banjara says.
The company says it processes all data on the device running its software, rather than sending it off to remote servers, and none of the images captured by the cameras are saved. The technology works even in changing light, the company says.
In Banjara’s sparsely decorated living room in Mountain View, California, the PredictGaze team showed me a deluge of demos on a TV set, laptop, iPad, iPhone, and iPod Touch.
In one, Banjara and I sat down in front of a large flat-screen TV connected to an iPad running PredictGaze software. As a Rihanna music video blared on the TV, Banjara stood up and walked away; nothing happened. I walked away, too, and the video paused as the iPad’s front camera determined that nobody was left to watch.
In another demo, a PredictGaze team member holding an iPad browsed a Web page and turned the pages of an e-book with his gaze.
In a third demo, two of us sat in front of the iPad and software accurately determined how many people were there, what each person’s gender was, if they were smiling, and more. It could determine my gender even when I wore glasses and covered my hair.
Right now, some of PredictGaze’s efforts are sensitive and finicky: if Banjara and I stayed seated on the couch but turned to each other, Rihanna also stopped singing. And when I played the iPhone game that includes PredictGaze software, it often paused, citing trouble tracking my head motions. But with a bit of tweaking—and inclusion of wide-angle cameras on TVs and tablets that can capture more action—I can imagine PredictGaze’s utility.
The company wants to sell its software development kit to developers for use within their applications. It also plans to sell its technology to electronics companies, which are increasingly selling gadgets that include front-facing cameras, so that PredictGaze can be built right into devices. Eventually, PredictGaze may build its own hardware, too.
But Roel Vertegaal, an associate professor of human-computer interaction at Queen’s University in Kingston, Ontario, and director of the school’s Human Media Lab, isn’t that impressed with PredictGaze’s technology, which he says isn’t new. He and others have been working on such interfaces for years, he says.
Still, he thinks it’s good that such technology is becoming commercially viable.
“I think there is a real need for this kind of thing,” he says.
Back in his living room, Banjara thinks so, too.
This new data poisoning tool lets artists fight back against generative AI
The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.
Rogue superintelligence and merging with machines: Inside the mind of OpenAI’s chief scientist
An exclusive conversation with Ilya Sutskever on his fears for the future of AI and why they’ve made him change the focus of his life’s work.
The Biggest Questions: What is death?
New neuroscience is challenging our understanding of the dying process—bringing opportunities for the living.
Driving companywide efficiencies with AI
Advanced AI and ML capabilities revolutionize how administrative and operations tasks are done.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.