Nvidia has officially entered the CPU market with Grace, a data center CPU which is designed to accompany GPUs in at-scale AI and high performance computing (HPC) markets.
“Today, we’re introducing a new kind of computer, the basic building block of the modern data center,” said Jensen Huang, Nvidia’s CEO in his keynote speech at the company’s GPU Technology Conference (GTC). Grace “brings together the latest GPU accelerated computing, Mellanox high-performance networking, and something brand new: the final piece of the puzzle.”
Huge natural language processing (NLP) AI models such as GPT-3 are hungry for compute and growing model size and complexity is helping drive demand for ever more powerful AI computers in the data center and cloud.
GPUs are designed for fast compute with high memory bandwidth, but the CPU in between memory and the GPUs is often the bottleneck in moving data into the GPUs. In his GTC keynote speech, Huang described a training system designed for AI models so large that they cannot fit into the GPU’s memory. A typical system, he said, might have four GPUs with a total of 80 GB of super-fast memory running at 2 TB/s each. Beside the GPUs is a CPU with a 1-TB memory running at only 0.2 TB/s. The CPU memory is three times larger, but 40 times slower than the GPU.
With faster CPU memories and dedicated channels between the CPU and each GPU, the situation improves, but PCIe becomes the bottleneck. NVLink, designed as a fast GPU-GPU interconnect, could be used, but there are no x86 CPUs that have NVLink, let alone four NVLink channels.
Enter Grace, Nvidia’s purpose-built CPU named after US computing pioneer Grace Hopper. It’s designed for accelerated computing applications at large scale in data center and HPC applications, with four next-generation NVLink channels providing extremely fast interconnect — 900 GB/s bi-directional bandwidth. This can optimize data movement and simplify programming by providing a single large address space. Grace is also cache coherent to simplify programmability.
Grace uses LPDDR5x memory technology, which offers twice the bandwidth of DDR4 at 10X the energy efficiency. LPDDR5x is already popular in the mobile world, but Nvidia has been working with its partners to support server-class reliability through mechanisms like ECC and redundancy to make memory products suitable for the data center.
The result is systems able to train trillion-parameter AI models that currently take months in just a few days, 10X faster than today’s best architectures. There is also the possibility of making real-time inference possible for systems as large as these, opening up a raft of exciting applications.
Armed and ready
Grace is based on an as-yet unreleased Arm Neoverse core, with each Grace CPU delivering 300 SPECint for a total of more than 2400 SPECint rates for an eight-GPU DGX system.
“Grace highlights the beauty of Arm,” said Huang. “Their IP model allows us to create the optimal CPU for this application, which achieves X-Factor speed up.”
Nvidia announced six months ago that it intends to acquire Arm, but Grace would have been in the works for a long time prior to that decision. The Arm architecture has been gaining ground in the data center over the last few years, where the cores offer energy efficiency and allow for denser racks. Nvidia already uses Arm CPU cores in its Bluefield DPUs.
“Arm is the most popular CPU in the world, for good reason — it’s super energy-efficient and its open licensing model inspires a world of innovators to create products around it,” Huang said. “Arm is used broadly in mobile and embedded today. For other markets like the cloud, enterprise and edge data centers, supercomputing and PCs, Arm is just starting and has great growth opportunities. Each market has different applications and has unique systems, software, peripherals, and ecosystems. For the markets we serve, we can accelerate Arm’s adoption.”
Nvidia’s primary competition in the data center CPU market is Intel, with AMD following behind. All three companies have been building out their computing platforms to prepare for the age of heterogeneous compute. Intel has the most complete portfolio, and AMD is set to acquire Xilinx imminently; with the Grace CPU launch Nvidia is edging closer to the complete platform. However, Huang stressed that Nvidia architectures and platforms will support x86 and Arm, “whatever customers and markets prefer.”
Huang said Nvidia’s data center roadmap was now centered on three silicon lines: CPU, GPU and DPU (data processing unit, Nvidia’s Arm-based NIC chip).
“Each chip architecture has a two year rhythm, with likely a kicker in between,” Huang said. “One year, we’ll focus on x86 platforms, one year we’ll focus on arm platforms. Every year, we’ll see new exciting products from us.”
Supercomputing with Grace
Nvidia’s first customer for Grace is CSCS, the Swiss National Supercomputing Center. Grace will power CSCS’s brand new supercomputer, ALPS, which will be the world’s fastest AI supercomputer at 20 exaflops (7x faster than Nvidia’s Selene). Alongside Grace, ALPS will use a not-yet-announced Nvidia GPU. It will be used to advance research in climate and weather, material sciences, molecular dynamics, as well as domains like economics and social sciences. Alps will built by HP enterprise and come online in 2023.
Huang also used his keynote to announce a couple of interesting partnerships.
Nvidia will collaborate with AWS on cloud instances that combine AWS’ Graviton 2 Arm-based AI accelerator together with Nvidia GPUs for AI and cloud gaming. Calling Graviton CPUs “extremely impressive,” Huang pointed out that mobile gaming is growing fast and is the primary form of gaming in some markets today. The new Nvidia-AWS instances will allow users to stream Arm-based applications and Android games straight from AWS. They are expected later this year.
Another key partnership is with Ampere, a startup building Arm-based CPUs for data center and cloud applications. This partnership will create scientific and cloud computing SDKs and reference systems.