As I walked around my office on a recent morning, a female voice on my iPhone narrated the objects I passed. “Brick,” “wall,” “telephone,” she said matter-of-factly. The voice paused when I came upon a bike hung on a wall-mounted rack, then intoned, “bicycle.”
The voice is part of a free image-recognition app called Aipoly that’s trying to make it easier for those with vision impairments to recognize their surroundings. To use it, you point the phone’s rear camera at whatever you want it to identify, and Aipoly will speak what it sees (or, at least, what it thinks it sees) and show the object’s name on the phone’s display. Aipoly runs directly on your phone, so it doesn’t need Internet access to work, and it can identify one object after another as you move the phone around, without requiring you to snap a photo of each thing.
The creators of Aipoly hope the app can be helpful for people with severe vision impairments—and perhaps for those trying to learn a new language. They also hope it will be faster than other image-recognition-related apps that rely on the aid of other humans, like Be My Eyes, or that require the Internet, such as TapTapSee.
The app was rolled out for the iPhone early this year by a Melbourne-based startup of the same name. Aipoly cofounder Simon Edwardsson says it recognizes images by using deep learning, which is a machine-learning technique inspired by studies of the brain. It’s the same technology used by Facebook for recognizing faces and Google for searching images.
The app figures out what something is by breaking an image down into different characteristics, like curves, lines, and patterns (such as stripes), then uses those features to determine the likelihood that the image is a specific object.
Aipoly was able to tell me what plenty of the things are that I can find around my office, though it still needs a lot of work. It can only recognize about 1,000 objects so far; Edwardsson says the company is trying to ramp that up to 5,000, though that would still not be all that many if you consider how many distinct things you come across every day.
And while the app was frequently correct when spotting an object, even when partly occluded (such as headphones slung around my colleague’s neck), there were plenty of misfires, too.
For example, its female voice said “Chevrolet,” then “wheel,” and, finally, “mouse” while the phone was focused on my computer mouse. And the app kept misidentifying a microwave as “AC.”
Users can train Aipoly by typing a correct word or phrase to identify an object that it gets wrong; this data will then be uploaded to Aipoly’s servers and the startup will add descriptions to Aipoly every few weeks or so as it releases new versions of the app, Edwardsson says.
Jeff Bigham, an associate professor in human computer interaction at Carnegie Mellon University who studies assistive technologies, says that Aipoly seems to work pretty well, and it could be used to get a sense of your surroundings by moving the phone around you. But he questions how useful some of its object-identifying skills are, since people tend to be good at determining what something is by touching it.
“What’s a lot more useful is telling things apart that are otherwise indistinguishable,” he says, like cans or boxes of food that feel about the same.