Microsoft Says Programmable Chips Will Make AI Software Smarter

A new approach to powering AI software could produce artificial neural networks of “unprecedented size,” says Microsoft.

Tom Simonitearchive page

August 25, 2015

Recent breakthroughs in how accurately software can recognize images and speech came thanks to additional computing power behind a technique known as deep learning. Microsoft now reports progress on an idea that could put even greater muscle behind the technique. A practical way to power up deep learning software even more could lead to further significant advances in the intelligence of machines.

Deep learning software learns to make sense of data using rough simulations of biological neurons (see “10 Breakthrough Technologies 2013: Deep Learning”). One priority for companies such as Google, Microsoft, and Facebook investing in the technology is finding ways to train larger networks of neurons with larger collections of training data, by running the software on more powerful computers.

Using graphics processors, known as GPUs, has proven to be one of the best ways of doing that. But their price and high electricity consumption makes GPUs costly even for large companies. “It’s very expensive and challenging to build, maintain, and scale out your own training platform,” says Eric Chung, a researcher at Microsoft. Systems of GPUs used for deep learning are generally “small to medium” compared to the scale of the groups of the computers that work together to power online services, he says.

Chung is part of a project investigating a possible route to running deep learning at much greater scale. The idea is to use FPGAs, field-programmable gate arrays, chips that can be reconfigured to implement any design and that can be very power-efficient. Microsoft began using FPGAs to power parts of its Bing search engine last year, and reported it was testing their use to power the virtual neurons of deep learning in February. Chung says that the research has now advanced to using some of the most powerful FPGAs available, and that it looks like a practical way to deliver a major boost to the power of deep learning. Microsoft is using FPGAs made by Altera, a company that chip maker Intel bought in June for $17 billion, citing the potential for such chips to make corporate data centers more powerful.

Even at what Chung called the “prototyping” stage, the team found a nearly tenfold increase in the performance of a neural network attempting to identify images, compared to conventional computers without GPUs. “It could be a game-changer if we eventually manage to deploy FPGAs widely at scale, which will provide an aggregate capability that exceeds what’s possible today,” he says.

Using FPGAs does come with drawbacks, for example the work that has to be done to program them to do the work at hand. But Chung predicts the technique will allow training of neural networks of “unprecedented size and quality.”

That might help lead to improvements in things like software that can describe the content of images (see “Google Software Describes What It Sees in Images”), or understand language and show a form of common sense (“Teaching Machines to Understand Us”). Microsoft’s latest results on using FPGAs were presented at the Hot Chips conference on advances in processor performance in Cupertino, California, on Tuesday.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.