Seeing by Sound

A new wearable computer can transform cities and buildings into soundscapes, researchers say, helping visually impaired people get around more easily.

Susan Nasrarchive page

August 16, 2006

Blind people traversing a city face a formidable challenge: quickly and safely navigating a complex environment. Researchers at the Georgia Institute of Technology say their wearable computer provides the newest high-tech solution.

Researchers Frank Dellaert (left) and Bruce Walker (right) hold a GPS and computer vision system for guiding people with blindness. Dellaert holds two versions of the hat-mounted cameras that function as the system’s “eyes” during navigation indoors. (Credit: Rob Felt, Georgia Institute of Technology)

The system’s hardware includes two Global Positioning System (GPS) receivers, a laptop, head and body compasses, a gyroscope-based tracker that measures the head’s tilt, and four small cameras mounted on a helmet. For audio (the device uses a speech interface), users listen to “bone phones,” which fit behind the ears and transmit sound by vibrating against the skull. A user’s ears are thus free to listen to important ambient noise, such as city traffic. It weighs about three pounds in total, and most parts tuck neatly into a backpack.

The device uses GPS and digital maps to guide the wearer to a destination. Outdoors, GPS pinpoints a user’s location. Users verbally tell the device where they want to go, and the system wirelessly extracts an area map, which includes everything from businesses to bushes, from a remote Geographic Information System (GIS) database. Then, “sound beacons,” soft tones emanating in stereo through the bone phones, guide the person to a destination.

“Imagine there’s a ring around your head a meter away from your body,” explains Bruce Walker, assistant professor of psychology and designer of the auditory interface. “If you need to walk straight, the sound will come from straight ahead. If you need to turn a corner, the sound will seem to come from the right. Turn your body until the sound is in front of you again – and away you go.” And the tones speed up as users approach their destination.

The navigation is precise without requiring bulky antennas, the researchers say. By combining data from multiple GPS receivers and other location sensors, then accounting for error in the devices’ estimates, the system pinpoints users’ locations much more accurately than GPS alone, to within a foot of where they really are.

Although GPS loses a signal indoors and between tall buildings, the cameras, which are part of a computer vision system, pick up the slack. “By having computer vision on board, we can go where GPS can’t,” says Frank Dellaert, assistant professor of computing at Georgia Tech. Indoors, the cameras “see” building interiors in lines and patches of color. The computer searches stored building floorplans for these shapes, finally pinpointing the user’s location, by matching the cameras’ input to a location on a digitized floor plan of the building. Sound beacons then guide users as they do outside.

In addition to guiding users, the system describes to them what’s around. It aims for an easy, literal translation of the environment into sound, says Dellaert. Surrounding objects sound like what they are. As users pass a park, for instance, the sound of wind blowing through trees comes through the bone phones in the direction of the park. Indoors, knocking sounds announce doors. Objects with no sound in real life can be transformed by word compacting, says Walker. “‘Mom’s house’ becomes ‘m’souse,’” for example, and the system can learn to compact new words.

The Georgia team’s device joins a number of other high-tech solutions designed to help blind people get around. The most common ones use a single GPS receiver, GIS maps, and spoken, turn-right, turn-left directions to guide people along routes. But single GPS units, with an error radius of up to 30 feet, can be dangerously imprecise for pedestrians, says Walker, and provide no help indoors.

Experimental solutions, like the city of San Francisco’s Talking Signs and the University of Florida’s DRISHTI system, make cities smart by pasting information-carrying RFID tags on doors, exits, sidewalks, and street signs. When blind people walk by the tags with a reader, the objects announce themselves. This eliminates the need for GPS and maps, says Sumi Helal, professor of computer and information science and engineering at the University of Florida and head of the DRISHTI project. But it’s impractical and expensive to put tags everywhere.

The Georgia team’s sound beacon system is ideal for navigation, says Jack Loomis, professor of psychology at the University of California, Santa Barbara. His research shows that sounds projected in 3-D help visually impaired users navigate faster and more accurately than spoken, turn-by-turn directions. His group’s Personal Guidance System, a navigation system similar to the Georgia team’s, projects words in 3-D space for users to follow.

A smaller, smarter version of the current prototype is on the way, the Georgia researchers say. The computer vision cameras will be scaled down to fit on a pair of glasses, and a cell phone or PDA will replace the laptop. In the far future, says Dellaert, the computer vision system will draw its own maps of building interiors, solving the “biggest downside” of the system; currently, the researchers must manually make digital maps of buildings from data or floorplans in advance of a user’s visit.

The team recently tested the sound beacons on blindfolded students who navigated by joystick around a computer maze. “In about four minutes, they got it,” says Walker. “After 20, they were moving quickly through complex paths.” Next month, the team plans to test the full hardware on blind users navigating around the Georgia Institute of Technology campus.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.