The Pint-Sized Supercomputer That Companies Are Scrambling to Get

Dozens of organizations are shelling out $129,000 for a box that will help them train AI software.

Elizabeth Woykearchive page

December 14, 2016

Nvidia’s DGX-1 supercomputer is designed to train deep-learning models faster than conventional computing systems do.

To companies grappling with complex data projects powered by artificial intelligence, a system that Nvidia calls an “AI supercomputer in a box” is a welcome development.

Early customers of Nvidia’s DGX-1, which combines machine-learning software with eight of the chip maker’s highest-end graphics processing units (GPUs), say the system lets them train their analytical models faster, enables greater experimentation, and could facilitate breakthroughs in science, health care, and financial services.

Data scientists have been leveraging GPUs to accelerate deep learning—an AI technique that mimics the way human brains process data—since 2012, but many say that current computing systems limit their work. Faster computers such as the DGX-1 promise to make deep-learning algorithms more powerful and let data scientists run deep-learning models that previously weren’t possible.

The DGX-1 isn’t a magical solution for every company. It costs $129,000, more than systems that companies could assemble themselves from individual components. It also comes with a fixed amount of system memory and GPU cards. But because the relevant parts and programs are preinstalled in a metal enclosure about the size of a medium suitcase, and since it pairs advanced hardware with fast connectivity, Nvidia claims the DGX-1 is easier to set up and quicker at analyzing data than previous GPU systems. Moreover, the positive reception the DGX-1 has attracted in its first few months of availability suggests that similar all-in-one deep-learning systems could help organizations run more AI experiments and refine them more rapidly. Though the DGX-1 is the only system of its kind today, Nvidia’s manufacturing partners will release new versions of the supercomputer in early 2017.

Fewer than 100 companies and organizations have bought DGX-1s since they started shipping in the fall, but early adopters say Nvidia’s claims about the system seem to hold up. Jackie Hunter, CEO of London-based BenevolentAI’s life sciences arm, BenevolentBio, says her data science team had models training on the DGX-1 the same day it was installed. She says the team was able to develop several large-scale models designed to identify suitable molecules for drugs within eight weeks. These models train three to four times faster on the DGX-1 than on the startup’s other GPU systems, according to Hunter. “We had multiple models that originally took weeks to train, but we can now do this in days and hours instead,” she adds.

Massachusetts General Hospital has a DGX-1 in one of its data centers and has one more on order. It says it needs GPU supercomputers such as the DGX-1 to crunch large volumes of dissimilar types of data. MGH’s Center for Clinical Data Science, which is coördinating access to the hospital’s DGX-1 across the Boston-area PartnersHealthCare system, says projects using the supercomputer will involve analyzing pathology and radiology images, electronic health records, and genomic information.

“If you’re incorporating not just x-rays, but a whole host of clinical information, billing information, and social media feeds as indicators of a patient’s health, you really do need large amounts of GPU computing power to crush that,” says center director Mark Michalski.

Several other organizations are deploying DGX-1s to make sense of huge quantities of data related to health care and medical research. Argonne and Oak Ridge national laboratories use theirs to study the origins of cancer and identify new therapies as part of Joe Biden’s Cancer Moonshot project.

DGX-1s are in active use in the AI research community as well. Nvidia donated the first DGX-1 it produced to the nonprofit AI research company OpenAI and gave nine other systems to universities with prominent deep-learning departments, including New York University, Stanford University, and the University of Toronto.

Multinational corporations are also snapping up the systems. SAP, which makes software to help businesses manage their operations and customer relations, has installed DGX-1s in two of its global innovation centers, one in Potsdam, Germany, and one in Ra’anana, Israel, and is running proof-of-concept projects on the systems to identify the best ways to make use of their scale and speed, says vice president Markus Noga. Fidelity Labs, the R&D arm of Fidelity Investments, also owns two DGX-1s and plans to use them to build neural networks or computer systems modeled on the human brain, says labs director Sean Belka.

Even those who already own a DGX-1 will likely continue to use a mix of high-performance computing systems, including cloud computing and other GPU-based systems, rather than move all of their deep-learning work to the supercomputer. Other companies might not buy one in the first place because of its steep upfront cost and fixed configuration.

But many seem to think the price is worth it. BenevolentAI estimates that the cost of renting enough servers on Amazon Web Services to match the DGX-1’s performance would surpass the system’s $129,000 price tag within a year. Greg Diamos, a senior researcher in Baidu’s Silicon Valley AI Lab, who is an expert in high-performance computing, acknowledges that the supercomputer is expensive but says the price reflects the configuration work and support Nvidia provides. Baidu’s AI Lab does not have a DGX-1, but is in the process of upgrading its system to the same GPU cards, and anticipates that the new technology will accelerate its AI research by about 3.5 times, according to Diamos.

“Companies that are focused on building deep-learning applications and don’t want to worry about designing the hardware and software platform that they run on will probably consider the DGX-1,” Diamos says. “But I expect larger customers who do all of this work in-house to buy individual GPUs and integrate them themselves into custom HPC clusters rather than paying the premium for the DGX-1.”

Deep Dive

Computing

It’s time to retire the term “user”

The proliferation of AI means we need a new word.

Taylor Majewskiarchive page

How ASML took over the chipmaking chessboard

MIT Technology Review sat down with outgoing CTO Martin van den Brink to talk about the company’s rise to dominance and the life and death of Moore’s Law.

Why it’s so hard for China’s chip industry to become self-sufficient

Chip companies from the US and China are developing new materials to reduce reliance on a Japanese monopoly. It won’t be easy.

Zeyi Yangarchive page

VR headsets can be hacked with an Inception-style attack

Researchers managed to crack Meta’s Quest VR system, allowing them to steal sensitive information, and manipulate social interactions.

Melissa Heikkiläarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

The Pint-Sized Supercomputer That Companies Are Scrambling to Get

Deep Dive

Computing

It’s time to retire the term “user”

How ASML took over the chipmaking chessboard

Why it’s so hard for China’s chip industry to become self-sufficient

VR headsets can be hacked with an Inception-style attack

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Deep Dive

Computing

It’s time to retire the term “user”

How ASML took over the chipmaking chessboard

Why it’s so hard for China’s chip industry to become self-sufficient

VR headsets can be hacked with an Inception-style attack

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review