Skip to Content

IBM Pushes Deep Learning with a Watson Upgrade

IBM is combining different AI techniques, including deep learning, in the commercial version of Watson.

IBM’s Jeopardy!-playing computer system, Watson, combined two separate areas of artificial intelligence research with winning results. Natural language understanding was merged with statistical analysis of vast, unstructured piles of text to find the likely answers to cryptic Jeopardy! clues.

Now IBM aims to add another powerful AI technique, known as deep learning, to the commercial version of Watson. The move could make the platform considerably smarter and more useful, and points to a promising future direction for AI research.

In its effort to commercialize Watson, IBM has made some of the features developed for the Jeopardy! challenge, as well as some new ones, available to developers via a cloud application programming interface (API). It has now added three deep-learning-based features to this Watson API: translation, speech-to-text, and text-to-speech. These could be used to build, for example, apps or websites that offer translation or transcription services. But developers could also connect them to other Watson services that parse questions and search for answers in large amounts of text. This could lead to an app that makes it possible to search large numbers of documents with naturally spoken queries.

The company has also said that it will collaborate with Yoshua Bengio, a professor at the University of Montreal in Canada, a prominent figure in the field of deep learning.

Deep learning involves training a computer to recognize often complex and abstract patterns by feeding large amounts of data through successive networks of artificial neurons, and refining the way those networks respond to the input. In recent years, the approach has proved very effective for recognizing spoken words or other audio, or classifying visual information (see “Breakthrough Technologies 2013: Deep Learning”).

Rapid advances have been made in deep learning in recent years thanks to large quantities of classified data becoming available, especially online, and because powerful parallel graphics processors have proved particularly effective at performing the necessary computations. Some of the world’s largest technology companies are keen to apply deep learning in commercially relevant ways (see “Facebook Launches AI Effort to Find Meaning in Your Posts” and “Is Google Cornering the Market in Deep Learning?). Google and Facebook have also hired leading figures in deep learning to apply the technology to their businesses.

However, although the results produced by deep learning systems are often spectacular, the systems responsible are extremely specialized, and they can fail in surprising ways because they don’t comprehend the world in a very meaningful way. If deep learning can be combined with other AI techniques effectively, that could produce more rounded, useful systems.

“You can imagine a lot of different use cases,” says Jerome Pesenti, vice president of core technologies for Watson. “Let’s say you have a banking or insurance product, you can talk over the phone and say, ‘Hey, this is my problem,’ and have something that actually interacts back with you automatically, or gives you to an actual human when the system doesn’t know how to answer. That’s the kind of system we’re putting out there right now.”

Combining disparate strands of AI research could become an important trend in coming years.

“A key challenge for modern AI is putting back together a field that has almost splintered among these methodologies,” says James Hendler, director of the Rensselaer Polytechnic Institute for Data Exploration and Applications in Troy, New York. RPI has access to an early version of Watson donated to the university by IBM, and Hendler teaches courses based on the technology. “The key thing about Watson,” he says, “is that it’s inherently about taking many different solutions to things and integrating them to reach a decision.”

Applying learning from one area, such as vision, to another, such as speech, is known as a multimodal approach. It could make future AI systems far more useful and could yield fundamental insights into the nature of intelligene.

When it comes to commercializing such advances,IBM may have, thanks to Watson, a head start on integrating new techniques in useful ways. Pesenti says that his team is already making progress in this area. “If I talk to you about a dog, it’s very hard to have an understanding of what a dog is without having an experience of that dog, which you get through a multimodality view of that,” he says. “We believe down the line that’s actually a really, really big part of our strategy.” 

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.