Oak Ridge National Laboratory has published the overview of its Crusher system which is powered by AMD’s Optimized 3rd Gen EPYC CPUs & Instinct MI250X GPUs.
ORNL’s All AMD-Powered Crusher System Overview Published: Features Optimized 3rd Gen EPYC CPUs & Instinct MI250X GPUs
The Crusher system is a test platform for ORNL’s upcoming Frontier supercomputer which will feature the latest AMD EPYC ‘Trento’ CPUs and Instinct MI250X ‘Aldebaran’ GPUs. As such, it has smaller number of nodes but even so, it packs a lot of punch given the hefty amount of CPU/GPU cores that are featured within it.
Crusher is an National Center for Computational Sciences (NCCS) moderate-security system that contains identical hardware and similar software as the upcoming Frontier system. It is used as an early-access testbed for Center for Accelerated Application Readiness (CAAR) and Exascale Computing Project (ECP) teams as well as NCCS staff and our vendor partners.
The overview published by ORNL states that the Crushes test system will consist of 2 cabinets, one with 128 compute nodes and the other with 64 compute nodes, totaling 192 compute nodes in the full configuration. Each node features a single 64-core AMD EPYC 7A53 CPU which is based on the 3rd Gen Optimized EPYC CPU architecture. We know that Frontier is going to be powered by AMD’s Trento CPUs which is an optimized version of the Milan chip. It features the same 64 cores and 128 threads but optimizations to clocks and power efficiency. Each CPU will have access to 512 GB DDR4 memory.
For the GPU side, each node will feature four AMD Instinct MI250X GPUs, packing 2 GCDs and each node treats the GCD as a separate GPU so Crusher will have access to 8 GPUs in total. Each MI250X GPU offers up to 52 TFLOPs of peak FP64 compute horsepower, 220 compute units (110 per GCD) & 128 GB of HBM2e memory (64 GB per GPU) for up to 3.2 TB/s bandwidth per MI250X accelerator. Each GCD is connected together through an Infinity Fabric link that offers 200 GB/s bi-directional bandwidth.
Talking about interconnects, the AMD EPYC CPUs are connected to the GPU with Infinity Fabric with a peak bandwidth of 36+36 GB/s. The Crusher nodes are connected via four HPE Slingshot 200 Gbit per second NICs (25 GB/s) providing a node-injection bandwidth of 800 Gbps (100 GB/s).
There are [4x] NUMA domains per node and [2x] L3 cache regions per NUMA for a total of [8x] L3 cache regions. The 8 GPUs are each associated with one of the L3 regions as follows:
- hardware threads 000-007, 064-071 | GPU 4
- hardware threads 008-015, 072-079 | GPU 5
- hardware threads 016-023, 080-087 | GPU 2
- hardware threads 024-031, 088-095 | GPU 3
- hardware threads 032-039, 096-103 | GPU 6
- hardware threads 040-047, 104-111 | GPU 7
- hardware threads 048-055, 112-119 | GPU 0
- hardware threads 056-063, 120-127 | GPU 1
The following block diagram of a singular Crusher node shows the inter-connection bandwidths between the AMD EPYC CPUs and Instinct MI250X GPU accelerators:
In addition to that, the Crusher system also hots 250 PB of storage with a peak write speed of 2.5 TB/s, with access to the center-wide NFS-based filesystem. Expect to see more from AMD’s EPYC CPU and Instinct GPU platforms when they become operational in the Frontier supercomputer this year.