Google Reveals a Powerful New AI Chip and Supercomputer

The new chip and a cloud-based machine-learning supercomputer will help Google establish itself as an AI-focused hardware maker.

Will Knightarchive page

May 17, 2017

If artificial intelligence is rapidly eating software, then Google may have the biggest appetite around.

At the company’s annual developer conference today, CEO Sundar Pichai announced a new computer processor designed to perform the kind of machine learning that has taken the industry by storm in recent years (see “10 Breakthrough Technologies: Deep Learning”).

The announcement reflects how rapidly artificial intelligence is transforming Google itself, and it is the surest sign yet that the company plans to lead the development of every relevant aspect of software and hardware.

Perhaps most importantly, for those working in machine learning at least, the new processor not only executes at blistering speed, it can also be trained incredibly efficiently. Called the Cloud Tensor Processing Unit, the chip is named after Google’s open-source TensorFlow machine-learning framework.

Training is a fundamental part of machine learning. To create an algorithm capable of recognizing hot dogs in images, for example, you would feed in thousands of examples of hot-dog images—along with not-hot-dog examples—until it learns to recognize the difference. But the calculations required to train a large model are so vastly complex that training might take days or weeks.

Pichai also announced the creation of machine-learning supercomputers, or Cloud TPU pods, based on clusters of Cloud TPUs wired together with high-speed data connections. And he said Google was creating the TensorFlow Research Cloud, consisting of thousands of TPUs accessible over the Internet.

“We are building what we think of as AI-first data centers,” Pichai said during his presentation. “Cloud TPUs are optimized for both training and inference. This lays the foundation for significant progress [in AI].”

Google will make 1,000 Cloud TPU systems available to artificial intelligence researchers willing to openly share details of their work.

Pichai also announced a number of AI research initiatives during his speech. These include an effort to develop algorithms capable of learning how to do the time-consuming work involved with fine-tuning other machine-learning algorithms. And he said Google was developing AI tools for medical image analysis, genomic analysis, and molecule discovery.

Speaking ahead of the announcements, Jeff Dean, a senior fellow at Google, said this offering might help advance AI. “Many top researchers don’t have access to as much computer power as they would like,” he noted.

Google’s move into AI-focused hardware and cloud services is driven, in part, by efforts to speed up its own operations. Google itself now uses TensorFlow to power search, speech recognition, translation, and image processing. It was also used in the Go-playing program, AlphaGo, developed by another Alphabet subsidiary, DeepMind.

But strategically, Google could help prevent another hardware company from becoming too dominant in the machine-learning space. Nvidia, a company that makes the graphics processing chips that have traditionally been used for deep learning, is becoming particularly prominent with its various products (see “Nvidia CEO: Software Is Eating the World, but AI is Going to Eat Software”).

To provide some measure of the performance acceleration offered by its cloud TPUs, Google says its own translation algorithms could be trained far more quickly using the new hardware than existing hardware. What would require a full day of training on 32 of the best GPUs can be done in an afternoon using one-eighth of one of its TPU Pods.

“These TPUs deliver a staggering 128 teraflops, and are built for just the kind of number crunching that drives machine learning today,” Fei-Fei Li, chief scientist at Google Cloud and the director of Stanford’s AI Lab, said prior to Pichai’s announcement.

A teraflop refers to a trillion “floating point” operations per second, a measure of computer performance obtained by crunching through mathematical calculations. By contrast, the iPhone 6 is capable of about 100 gigaflops, or one billion floating point operations per second.

Google says it will still be possible for researchers to design algorithms using other hardware, before porting it over to the TensorFlow Research Cloud. “This is what democratizing machine learning is all about—empowering developers by protecting freedom of design,” Li added.

A growing number of researchers have adopted TensorFlow since Google released the software in 2015. Google now boasts that it is the most widely used deep-learning framework in the world.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.