Baidu’s Artificial-Intelligence Supercomputer Beats Google at Image Recognition

A supercomputer specialized for the machine-learning technique known as deep learning could help software understand us better.

Tom Simonitearchive page

May 13, 2015

Update: On June 1, 2015, Baidu amended its technical paper on its system to admit that it had broken rules governing the ImageNet Challenge that the company had used to claim it had beaten other research teams. The organizers of the challenge reviewed Baidu’s conduct and issued a statement saying its results should not be considered directly comparable to results obtained by others.

Chinese search company Baidu built this computer to accelerate its artificial-intelligence research.

Chinese search giant Baidu says it has invented a powerful supercomputer that brings new muscle to an artificial-intelligence technique giving software more power to understand speech, images, and written language.

The new computer, called Minwa and located in Beijing, has 72 powerful processors and 144 graphics processors, known as GPUs. Late Monday, Baidu released a paper claiming that the computer had been used to train machine-learning software that set a new record for recognizing images, beating a previous mark set by Google.

“Our company is now leading the race in computer intelligence,” said Ren Wu, a Baidu scientist working on the project, speaking at the Embedded Vision Summit on Tuesday. Minwa’s computational power would probably put it among the 300 most powerful computers in the world if it weren’t specialized for deep learning, said Wu. “I think this is the fastest supercomputer dedicated to deep learning,” he said. “We have great power in our hands—much greater than our competitors.”

Computing power matters in the world of deep learning, which has produced breakthroughs in speech, image, and face recognition and improved the image-search and speech-recognition services offered by Google and Baidu.

The technique is a souped-up version of an approach first established decades ago, in which data is processed by a network of artificial neurons that manage information in ways loosely inspired by biological brains. Deep learning involves using larger neural networks than before, arranged in hierarchical layers, and training them with significantly larger collections of data, such as photos, text documents, or recorded speech.

So far, bigger data sets and networks appear to always be better for this technology, said Wu. That’s one way it differs from previous machine-learning techniques, which had begun to produce diminishing returns with larger data sets. “Once you scaled your data beyond a certain point, you couldn’t see any improvement,” said Wu. “With deep learning, it just keeps going up.” Baidu says that Minwa makes it practical to create an artificial neural network with hundreds of billions of connections—hundreds of times more than any network built before.

A paper released Monday is intended to provide a taste of what Minwa’s extra oomph can do. It describes how the supercomputer was used to train a neural network that set a new record on a standard benchmark for image-recognition software. The ImageNet Classification Challenge, as it is called, involves training software on a collection of 1.5 million labeled images in 1,000 different categories, and then asking that software to use what it learned to label 100,000 images it has not seen before.

Software is compared on the basis of how often its top five guesses for a given image miss the correct answer. The system trained on Baidu’s new computer was wrong only 4.58 percent of the time. The previous best was 4.82 percent, reported by Google in March. One month before that, Microsoft had reported achieving 4.94 percent, becoming the first to better average human performance of 5.1 percent.

Wu said that Minwa had made it possible to train the system on higher-resolution images. It also permitted use of a technique that turned the original 1.2 million training images into two billion by distorting them, flipping them, and altering their colors. Using that larger training set improved accuracy by preventing the system from becoming too fixated on the exact details of the training images, said Wu. The resulting system should be better at handling real-world photos, he said.

As those slim margins of victory on the ImageNet challenge might suggest, deep learning is now ready for tougher challenges than image recognition, such as interpreting video or describing images in sentences (see “Google’s Brain-Inspired Software Describes What It Sees in Complex Images”). Wu said that as well as thinking about how to make Minwa even larger and use it on video and text, Baidu’s researchers are working on ways to shrink their trained neural networks so they can operate on mobile devices.

He showed a video of a prototype smartphone app that can recognize different breeds of dog, using a condensed version of a deep-learning network trained on a predecessor to Minwa. “If you know how to tap the computational power of a phone’s GPUs, you can actually recognize on the fly directly from the image sensor,” he said.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.