We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Sign-Language Translator

The first sign-language dictionary that’s searchable by gesture.

Bilingual dictionaries are usually a two-way street: you can look up a word in English and find, say, its Spanish equivalent, but you can also do the reverse. Sign-language dictionaries, however, translate only from written words to gestures. This can be hugely frustrating, particularly for parents of deaf children who want to understand unfamiliar gestures, or deaf people who want to interact online using their primary language. So Boston University (BU) researchers are developing a searchable dictionary for sign language, in which any user can enter a gesture into a dictionary’s search engine from her own laptop by signing in front of a built-in camera.

Searching by sign: Researchers in Boston are designing the first sign-language dictionary searchable by gesture. A signer (pictured) sits in a studio equipped with high-speed cameras that capture hand motions and facial expressions. Videos on a laptop prompt her to make particular signs. Video of the signer will be used to train algorithms to identify gestural patterns.

“You might have a collection of sign language in YouTube, and now to search, you have to search in English,” says Stan Sclaroff, a professor of computer science at BU. It’s the equivalent, Sclaroff says, of searching for Spanish text using English translations. “It’s unnatural,” he says, “and it’s not fair.”

Sclaroff is developing the dictionary in collaboration with Carol Neidle, a professor of linguistics at BU, and Vassilis Athitsos, assistant professor of computer science and engineering at the University of Texas at Arlington. Once the user performs a gesture, the dictionary will analyze it and pull up the top five possible matches and meanings.

“Today’s sign-language recognition is [at] about the stage where speech recognition was 20 years ago,” says Thad Starner, head of the Contextual Computing Group at the Georgia Institute of Technology. Starner’s group has been developing sign-language recognition software for children, using sensor-laden gloves to track hand movements. He and his students have designed educational games in which hearing-impaired children, wearing the gloves, learn sign language. A computer evaluates hand shape and moves on to the next exercise if a child has signed correctly.

Unlike Starner’s work, Sclaroff and Neidle’s aims for a sensorless system in which anyone with a camera and Internet connection can learn sign language and interact. The approach, according to Starner, is unique in the field of sign-language recognition, as well as in the field of computer vision.

“This takes a lot of processing power, and trying to deal with sign language in different video qualities is very hard,” says Starner. “So if they’re successful, it would be very cool to actually be able to search the Web in sign language.”

To tackle this stiff challenge, the BU team is asking multiple signers to sit in a studio, one at a time, and sign through 3,000 gestures in a classic American Sign Language (ASL) dictionary. As they sign, four high-speed, high-quality cameras simultaneously pick up front and side views, as well as facial expressions. According to Neidle, smiles, frowns, and raised eyebrows are a largely understudied part of ASL that could offer strong clues to a gesture’s meaning.

As the visual data comes in, Neidle and her students analyze it, marking the start and finish of each sign and identifying key subgestures–units equivalent to English phonemes. Meanwhile, Sclaroff is using this information to develop algorithms that can, say, distinguish the signer’s hands from the background, or recognize hand position and shape and patterns of movement. Given that any individual could sign a word in a slightly different way, the team is analyzing gestures from both native and non-native signers, hoping to develop a computer recognizer that can handle such variations.

The main challenge going forward may be taking into account the many uncontrollable factors on the user’s side of the interface, says Sclaroff. For example, someone using a gesture to enter a search query into a laptop will have a lower-quality camera. The background may be more cluttered than the carefully controlled studio environment in the database samples, and the computer will have to adjust for variables like clothing and skin tone.

“Just to produce the sign and look it up–that’s the real novelty we’re trying to accomplish,” says Neidle. “That would be an improvement over anything that exists now.”

Gain the insight you need on emerging technologies at EmTech MIT.

Learn more and register
Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.