A new immersive classroom uses AI and VR to teach Mandarin Chinese

Students will learn the language by ordering food or haggling with street vendors on a virtual Beijing street.

Karen Haoarchive page

July 16, 2019

An image of students standing in the immersive virtual environmentRensselaer Polytechnic Institute

Often the best way to learn a language is to immerse yourself in an environment where people speak it. The constant exposure, along with the pressure to communicate, helps you swiftly pick up and practice new vocabulary. But not everyone gets the opportunity to live or study abroad.

In a new collaboration with IBM Research, Rensselaer Polytechnic Institute (RPI), a university based in Troy, New York, now offers its students studying Chinese another option: a 360-degree virtual environment that teleports them to the busy streets of Beijing or a crowded Chinese restaurant. Students get to haggle with street vendors or order food, and the environment is equipped with different AI capabilities to respond to them in real time. While the classroom is largely experimental, it is being used for the first time in a six-week, for-credit course at the university this summer.

The project was inspired by two RPI faculty members who often used role-playing games to help their students learn Chinese. In parallel, over the last few years, several studies have found that interactive learning environments can increase language understanding and retention. One study published in 2018 also found that learning Japanese in a 3D virtual environment made students likelier to pick up vocabulary that they encountered incidentally through the simulation. On the basis of these ideas, the professors struck a collaboration with IBM Research to explore whether they could replicate such benefits for their own students.

In addition to surrounding the students with digital projections of a scene, the environment uses several types of sensors to dynamically adapt to the students’ words and actions. Microphones, worn by the participants, feed their audio directly into speech-recognition algorithms. Cameras track their movements and gestures to register when they point to various objects or walk up to different virtual agents. If a student points to a food dish in the restaurant scene and asks what it is, for example, a virtual agent can respond with the name and description. Narrative-generation technology also allows each agent to construct more sophisticated answers to off-the-cuff questions (“What’s the dish’s history?”) using knowledge from Wikipedia. (The conversation topics are still somewhat constrained, however, to whatever task the student is trying to complete.)

An image of students pointing at objects in the environment. — Cameras and sensors track the students' gestures so they can freely point at objects in a scene to engage.
Rensselaer Polytechnic Institute

Many of the technologies in the environment are commercially available products that were woven together into a cohesive experience. But a few had to be developed for the project specifically. Mandarin Chinese has five tones, for example, that are challenging for many new learners but crucial to conveying meaning. Say the word “sell” (卖 mài) a little bit off and you could end up saying “buy” (买 mǎi) instead. So the researchers created an algorithm to analyze the tones in the students’ pronunciation. It compares them with those of native speakers and shows where they differ, and then provides audio and visual feedback directly in the environment. It allows students to ask a virtual agent how to say something and immediately start practicing the new vocabulary.

Hui Su, the director of the Cognitive and Immersive Systems Laboratory, the collaboration between IBM Research and RPI that led the initiative, says his team is still in the early stages of understanding how effective it is. But in a pilot at the end of 2017, the researchers found qualitatively that it increased the students’ engagement and enjoyment in language learning, and helped them to quickly acquire new words.

Prior to a restaurant-ordering exercise, for example, the students were not taught how to pay for their food, but through observing their peers and conversing with the virtual agents, many picked up the necessary vocabulary to do so. “It was a bit of a surprise,” says Su. “One of the students commented that this should be the way to teach language,” he adds.

An image of the environment giving a student visual feedback on her pronunciation. — An algorithm gives students visual feedback on their tone pronunciation.
Rensselaer Polytechnic Institute

In the first year, the new course will use the virtual environment nearly half the time and a traditional classroom the rest, although this arrangement might change in the future.

If the class provides strong evidence of improving student learning outcomes, it could serve as a model for others. The most obvious idea would be to extend it to other languages. But it could also be used beyond universities to coach executives, train government staff, or conduct any other preparation activities that might benefit from scenario simulation and role play.

Ultimately the initiative will aid the researchers’ longer-term mission to understand how cognitive and immersive environments can affect learning, collaboration, and sense-making, says Su.

To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.