NVIDIA’s new Grace 144-core CPU Superchip, 600GB of GPU memory

NVIDIA’s new Grace 144-core CPU Superchip, 600GB of GPU memory. At GTC 2022, Nvidia CEO Jensen Huang finally shared more details about the company’s Arm endeavors as he unveiled the company’s new 144-core Grace CPU Superchip, the company’s first CPU-only Arm chip designed for the data center.

The Neoverse-based system supports Arm v9 and comes as two CPUs fused together with Nvidia’s newly branded NVLink-C2C interconnect tech. Nvidia claims the Grace CPU Superchip offers 1.5X more performance in a SPEC benchmark than two of the last-gen 64-core AMD EPYC processors in its own DGX A100 servers, and twice the power efficiency of today’s leading server chips. Overall, Nvidia claims the Grace CPU Superchip will be the fastest processor on the market when it ships in early 2023 for a wide range of applications, like hyper-scale computing, data analytics, and scientific computing.

There’s a huge 30x system memory bandwidth to the GPU improvement over the DGX A100 here, with a CPU and GPU that were designed for giant-scale AI and HPC. The 900GB/sec of the coherent interface is 7x faster than PCIe 5.0 which is only just dropping into gamers’ desktops now, with PCIe 5.0-enabled GPUs launching this year.

  • CPU+GPU designed for giant-scale AI and HPC
  • New 900 gigabytes per second (GB/s) coherent interface, 7X faster than PCIe Gen 5
  • 30X higher aggregate system memory bandwidth to GPU compared to DGX A100
  • Runs all NVIDIA software stacks and platforms, including NVIDIA HPC, NVIDIA AI, and NVIDIA Omniverse
  • High-performance CPU for HPC and cloud computing
  • Super chip design with up to 144 Arm v9 CPU cores
  • World’s first LPDDR5x with ECC Memory, 1TB/s total bandwidth
  • SPECrate2017_int_base over 740 (estimated)
  • 900 GB/s coherent interface, 7X faster than PCIe Gen 5
  • 2X the packaging density of DIMM-based solutions
  • 2X the performance per watt of today’s leading CPU
  • Runs all NVIDIA software stacks and platforms, including RTX, HPC, AI, and Omniverse

Nvidia Grace 144-core CPU Superchip Features

Before we get to the new Grace CPU Superchip, you’ll need a quick refresher on its first instantiation. Nvidia first announced what it originally called its Grace CPU last year, but the company didn’t share too many fine-grained details. Nvidia has now changed the name of this first effort to the Grace Hopper Superchip.

The Grace Hopper Superchip has two distinct chips, one CPU and one GPU, on one carrier board. We now know the CPU has 72 cores, uses a Neoverse-based design that supports Arm v9, and it’s paired with a Hopper GPU. These two units communicate over a 900 Gbps NVLink-C2C connection that provides memory coherency between the CPU and GPU, thus allowing both units to have simultaneous access to the pool of LPDDR5X ECC memory that has a claimed 30X bandwidth improvement over standard systems.

Nvidia originally didn’t announce the amount of LPDDR5X used for the design, but here we can see that the company now claims a ‘600GB Memory GPU,’ which assuredly includes the LPDDR5X memory pool. We know that LPDDR5X tops out at 64GB per package, meaning the CPU comes with up to 512 GB of LPDDR5X. Meanwhile, the Hopper GPU typically has 80GB of HBM3 capacity, placing us near Nvidia’s 600GB figure. Giving the GPU access to that amount of memory capacity could have a transformative effect on some workloads, particularly for properly-optimized applications. 

Today’s announcement covers the Grace CPU Superchip, which is based on the Grace Hopper CPU+GPU design but uses a second CPU package instead of the Hopper GPU. These two 72 core chips are also connected via the NVLink-C2C connection, providing a coherent 900 GB/s connection that melds them into one 144-core unit. In addition, the Arm v9 Neoverse-based chip supports Arm’s Scalable Vector Extensions (SVE), which are performance-boosting SIMD instructions that function similar to AVX.

The Grace CPU Superchip uses Arm v9, which tells us that the chip uses the Neoverse N2 design that you can read about in more depth here. The Neoverse N2 platform is Arm’s first IP to support newly-announced Arm v9 extensions like SVE2 and Memory Tagging and delivers up to 40% more performance over the V1 platform. The N2 Perseus platform comes as a 5nm design supporting PCIe Gen 5.0, DDR5, HBM3, CCIX 2.0, and CXL 2.0. The Perseus design is optimized for performance-per-power (watt) and performance-per-area.

That makes plenty of sense given that the Grace CPU Superchip consumes a peak of 500W for both the two CPUs and the onboard memory. That is competitive with other leading CPUs, like AMD’s EPYC, which tops out at 280W per chip (this doesn’t include memory power consumption). Nvidia claims the Grace CPU will be twice as efficient as competing CPUs when it comes to market.

Each CPU has access to its own eight LPDDR5X packages, so the two chips will still be influenced by the standard NUMA-like tendencies of near and far memory. Still, the increased bandwidth between the two chips should also help reduce latency due to less contention, thus making for a very efficient multi-chip implementation. The device also comes with 396MB of on-chip cache, but it isn’t clear if that is for a single chip or both.

The Grace CPU Superchip memory subsystem provides up to 1TB/s of bandwidth, which Nvidia says is a first for CPUs and more than twice that of other data center processors that will support DDR5 memory. The LPDDR5X comes spread out in 16 packages that provide 1TB of capacity. In addition, Nvidia notes that Grace uses the first ECC implementation of LPDDR5X.

This brings us to benchmarks. Nvidia claims the Grace CPU Superchip is 1.5X faster in the SPECrate_2017_int_base benchmark than the two previous-gen 64-core EPYC Rome 7742 processors it uses in its DGX A100 systems. Nvidia based this claim on a pre-silicon simulation that predicts the Grace CPU at a score of 740+ (370 per chip). AMD’s current-gen EPYC Milan chips, the current performance leader in the data center, have posted SPEC results ranging from 382 to 424 apiece, meaning the highest-end x86 chips will still hold the lead. However, Nvidia’s solution will have many other advantages, such as power efficiency and a more GPU-friendly design.

Leave A Reply

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.