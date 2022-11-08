This year we’ve seen a dizzying number of breakthroughs in generative AI, from AIs that can produce videos from just a few words to models that can generate audio based on snippets of a song.

Last week, Google held an AI event in its swanky, brand-new offices by the Hudson River in Manhattan. Your correspondent stopped by to see what the fuss was about. In a continuation of current trends, Google announced a slew of advances in generative AI, including a system that combines its two text-to-video AI models, Phenaki and Imagen. Phenaki allows the system to generate video with a series of text prompts that functions as a sort of script, while Imagen makes the videos higher resolution.

But these models are still a long way from being rolled out for the general public to use. They still have some major problems, such as the ability to generate violent, sexist, racist, or copyright-violating content owing to the nature of the training data, which is mostly just scraped off the internet. One Google researcher told me these models were still in an early stage and that a lot of “stars had to align” before they could be used in actual products. It’s impressive AI research, but it’s also unclear how Google could monetize the technologies.

What could have a real-world impact a lot sooner is Google’s new project to develop a “universal speech model” that has been trained on over 400 languages, Zoubin Ghahramani, vice president of research at Google AI, said at the event. The company didn’t offer many details but said it will publish a paper in the coming months.

If it works out, this will represent a big leap forward in the capabilities of large language models, or LLMs. AI startup Hugging Face’s LLM BLOOM was trained on 46 languages, and Meta has been working on AI models that can translate hundreds of languages in real time. With more languages contributing training data to its model, Google will be able to offer its services to even more people. Incorporating hundreds of languages into one AI model could enable Google to offer better translations or captions on YouTube, or improve its search engine so it’s better at delivering results across more languages.

During my trip to the East Coast, I spoke with top executives at some of the world’s biggest AI labs to hear what they thought was going to be driving the conversation in AI next year. Here’s what they had to say:

Douglas Eck, principal scientist at Google Research and a research director for Google Brain, the company’s deep-learning research team