Bilingual dictionaries are usually a two-way street: you can look up a word in English and find, say, its Spanish equivalent, but you can also do the reverse. Sign-language dictionaries, however, translate only from written words to gestures. This can be hugely frustrating, particularly for parents of deaf children who want to understand unfamiliar gestures, or deaf people who want to interact online using their primary language. So Boston University (BU) researchers are developing a searchable dictionary for sign language, in which any user can enter a gesture into a dictionary’s search engine from her own laptop by signing in front of a built-in camera.
“You might have a collection of sign language in YouTube, and now to search, you have to search in English,” says Stan Sclaroff, a professor of computer science at BU. It’s the equivalent, Sclaroff says, of searching for Spanish text using English translations. “It’s unnatural,” he says, “and it’s not fair.”
Sclaroff is developing the dictionary in collaboration with Carol Neidle, a professor of linguistics at BU, and Vassilis Athitsos, assistant professor of computer science and engineering at the University of Texas at Arlington. Once the user performs a gesture, the dictionary will analyze it and pull up the top five possible matches and meanings.
“Today’s sign-language recognition is [at] about the stage where speech recognition was 20 years ago,” says Thad Starner, head of the Contextual Computing Group at the Georgia Institute of Technology. Starner’s group has been developing sign-language recognition software for children, using sensor-laden gloves to track hand movements. He and his students have designed educational games in which hearing-impaired children, wearing the gloves, learn sign language. A computer evaluates hand shape and moves on to the next exercise if a child has signed correctly.
Unlike Starner’s work, Sclaroff and Neidle’s aims for a sensorless system in which anyone with a camera and Internet connection can learn sign language and interact. The approach, according to Starner, is unique in the field of sign-language recognition, as well as in the field of computer vision.
“This takes a lot of processing power, and trying to deal with sign language in different video qualities is very hard,” says Starner. “So if they’re successful, it would be very cool to actually be able to search the Web in sign language.”
To tackle this stiff challenge, the BU team is asking multiple signers to sit in a studio, one at a time, and sign through 3,000 gestures in a classic American Sign Language (ASL) dictionary. As they sign, four high-speed, high-quality cameras simultaneously pick up front and side views, as well as facial expressions. According to Neidle, smiles, frowns, and raised eyebrows are a largely understudied part of ASL that could offer strong clues to a gesture’s meaning.
As the visual data comes in, Neidle and her students analyze it, marking the start and finish of each sign and identifying key subgestures–units equivalent to English phonemes. Meanwhile, Sclaroff is using this information to develop algorithms that can, say, distinguish the signer’s hands from the background, or recognize hand position and shape and patterns of movement. Given that any individual could sign a word in a slightly different way, the team is analyzing gestures from both native and non-native signers, hoping to develop a computer recognizer that can handle such variations.
The main challenge going forward may be taking into account the many uncontrollable factors on the user’s side of the interface, says Sclaroff. For example, someone using a gesture to enter a search query into a laptop will have a lower-quality camera. The background may be more cluttered than the carefully controlled studio environment in the database samples, and the computer will have to adjust for variables like clothing and skin tone.
“Just to produce the sign and look it up–that’s the real novelty we’re trying to accomplish,” says Neidle. “That would be an improvement over anything that exists now.”
This artist is dominating AI-generated art. And he’s not happy about it.
Greg Rutkowski is a more popular prompt than Picasso.
What does GPT-3 “know” about me?
Large language models are trained on troves of personal data hoovered from the internet. So I wanted to know: What does it have on me?
DeepMind has predicted the structure of almost every protein known to science
And it’s giving the data away for free, which could spur new scientific discoveries.
An AI that can design new proteins could help unlock new cures and materials
The machine-learning tool could help researchers discover entirely new proteins not yet known to science.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.