Sign-Language Translator

The first sign-language dictionary that’s searchable by gesture.

Jennifer Chuarchive page

January 12, 2009

Bilingual dictionaries are usually a two-way street: you can look up a word in English and find, say, its Spanish equivalent, but you can also do the reverse. Sign-language dictionaries, however, translate only from written words to gestures. This can be hugely frustrating, particularly for parents of deaf children who want to understand unfamiliar gestures, or deaf people who want to interact online using their primary language. So Boston University (BU) researchers are developing a searchable dictionary for sign language, in which any user can enter a gesture into a dictionary’s search engine from her own laptop by signing in front of a built-in camera.

**Searching by sign:** Researchers in Boston are designing the first sign-language dictionary searchable by gesture. A signer (pictured) sits in a studio equipped with high-speed cameras that capture hand motions and facial expressions. Videos on a laptop prompt her to make particular signs. Video of the signer will be used to train algorithms to identify gestural patterns.

“You might have a collection of sign language in YouTube, and now to search, you have to search in English,” says Stan Sclaroff, a professor of computer science at BU. It’s the equivalent, Sclaroff says, of searching for Spanish text using English translations. “It’s unnatural,” he says, “and it’s not fair.”

Sclaroff is developing the dictionary in collaboration with Carol Neidle, a professor of linguistics at BU, and Vassilis Athitsos, assistant professor of computer science and engineering at the University of Texas at Arlington. Once the user performs a gesture, the dictionary will analyze it and pull up the top five possible matches and meanings.

“Today’s sign-language recognition is [at] about the stage where speech recognition was 20 years ago,” says Thad Starner, head of the Contextual Computing Group at the Georgia Institute of Technology. Starner’s group has been developing sign-language recognition software for children, using sensor-laden gloves to track hand movements. He and his students have designed educational games in which hearing-impaired children, wearing the gloves, learn sign language. A computer evaluates hand shape and moves on to the next exercise if a child has signed correctly.

Unlike Starner’s work, Sclaroff and Neidle’s aims for a sensorless system in which anyone with a camera and Internet connection can learn sign language and interact. The approach, according to Starner, is unique in the field of sign-language recognition, as well as in the field of computer vision.

“This takes a lot of processing power, and trying to deal with sign language in different video qualities is very hard,” says Starner. “So if they’re successful, it would be very cool to actually be able to search the Web in sign language.”

To tackle this stiff challenge, the BU team is asking multiple signers to sit in a studio, one at a time, and sign through 3,000 gestures in a classic American Sign Language (ASL) dictionary. As they sign, four high-speed, high-quality cameras simultaneously pick up front and side views, as well as facial expressions. According to Neidle, smiles, frowns, and raised eyebrows are a largely understudied part of ASL that could offer strong clues to a gesture’s meaning.

As the visual data comes in, Neidle and her students analyze it, marking the start and finish of each sign and identifying key subgestures–units equivalent to English phonemes. Meanwhile, Sclaroff is using this information to develop algorithms that can, say, distinguish the signer’s hands from the background, or recognize hand position and shape and patterns of movement. Given that any individual could sign a word in a slightly different way, the team is analyzing gestures from both native and non-native signers, hoping to develop a computer recognizer that can handle such variations.

The main challenge going forward may be taking into account the many uncontrollable factors on the user’s side of the interface, says Sclaroff. For example, someone using a gesture to enter a search query into a laptop will have a lower-quality camera. The background may be more cluttered than the carefully controlled studio environment in the database samples, and the computer will have to adjust for variables like clothing and skin tone.

“Just to produce the sign and look it up–that’s the real novelty we’re trying to accomplish,” says Neidle. “That would be an improvement over anything that exists now.”

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.