Bilingual dictionaries are usually a two-way street: you can look up a word in English and find, say, its Spanish equivalent, but you can also do the reverse. Sign-language dictionaries, however, translate only from written words to gestures. This can be hugely frustrating, particularly for parents of deaf children who want to understand unfamiliar gestures, or deaf people who want to interact online using their primary language. So Boston University (BU) researchers are developing a searchable dictionary for sign language, in which any user can enter a gesture into a dictionary’s search engine from her own laptop by signing in front of a built-in camera.
“You might have a collection of sign language in YouTube, and now to search, you have to search in English,” says Stan Sclaroff, a professor of computer science at BU. It’s the equivalent, Sclaroff says, of searching for Spanish text using English translations. “It’s unnatural,” he says, “and it’s not fair.”
Sclaroff is developing the dictionary in collaboration with Carol Neidle, a professor of linguistics at BU, and Vassilis Athitsos, assistant professor of computer science and engineering at the University of Texas at Arlington. Once the user performs a gesture, the dictionary will analyze it and pull up the top five possible matches and meanings.
“Today’s sign-language recognition is [at] about the stage where speech recognition was 20 years ago,” says Thad Starner, head of the Contextual Computing Group at the Georgia Institute of Technology. Starner’s group has been developing sign-language recognition software for children, using sensor-laden gloves to track hand movements. He and his students have designed educational games in which hearing-impaired children, wearing the gloves, learn sign language. A computer evaluates hand shape and moves on to the next exercise if a child has signed correctly.
Unlike Starner’s work, Sclaroff and Neidle’s aims for a sensorless system in which anyone with a camera and Internet connection can learn sign language and interact. The approach, according to Starner, is unique in the field of sign-language recognition, as well as in the field of computer vision.
“This takes a lot of processing power, and trying to deal with sign language in different video qualities is very hard,” says Starner. “So if they’re successful, it would be very cool to actually be able to search the Web in sign language.”
To tackle this stiff challenge, the BU team is asking multiple signers to sit in a studio, one at a time, and sign through 3,000 gestures in a classic American Sign Language (ASL) dictionary. As they sign, four high-speed, high-quality cameras simultaneously pick up front and side views, as well as facial expressions. According to Neidle, smiles, frowns, and raised eyebrows are a largely understudied part of ASL that could offer strong clues to a gesture’s meaning.
As the visual data comes in, Neidle and her students analyze it, marking the start and finish of each sign and identifying key subgestures–units equivalent to English phonemes. Meanwhile, Sclaroff is using this information to develop algorithms that can, say, distinguish the signer’s hands from the background, or recognize hand position and shape and patterns of movement. Given that any individual could sign a word in a slightly different way, the team is analyzing gestures from both native and non-native signers, hoping to develop a computer recognizer that can handle such variations.
The main challenge going forward may be taking into account the many uncontrollable factors on the user’s side of the interface, says Sclaroff. For example, someone using a gesture to enter a search query into a laptop will have a lower-quality camera. The background may be more cluttered than the carefully controlled studio environment in the database samples, and the computer will have to adjust for variables like clothing and skin tone.
“Just to produce the sign and look it up–that’s the real novelty we’re trying to accomplish,” says Neidle. “That would be an improvement over anything that exists now.”
Geoffrey Hinton tells us why he’s now scared of the tech he helped build
“I have suddenly switched my views on whether these things are going to be more intelligent than us.”
Deep learning pioneer Geoffrey Hinton has quit Google
Hinton will be speaking at EmTech Digital on Wednesday.
The future of generative AI is niche, not generalized
ChatGPT has sparked speculation about artificial general intelligence. But the next real phase of AI will be in specific domains and contexts.
Welcome to the new surreal. How AI-generated video is changing film.
Exclusive: Watch the world premiere of the AI-generated short film The Frost.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.