NVIDIA H100s top Graph500 on CoreWeave cluster
The GPU superpower ran a Graph500 breadth-first search that reached 410 trillion traversed edges per second.
NVIDIA achieved a Graph500 breadth-first search (BFS) result of 410 trillion traversed edges per second on an accelerated computing cluster hosted in a CoreWeave data center in Dallas, using 8,192 H100 GPUs to process a graph of 2.2 trillion vertices and 35 trillion edges.
The run ranked No. 1 on the 31st Graph500 BFS list, the company said in a statement. NVIDIA said the result more than doubled the performance of comparable entries on the list, including runs hosted in national labs.
The company said the submission used just over 1,000 nodes compared with about 9,000 nodes for a comparable top-10 entry, delivering roughly three times better performance per dollar. NVIDIA highlighted that saving hardware and time reduced costs at this scale.
NVIDIA said it reached the result by combining its CUDA platform, Spectrum-X networking, H100 GPUs and a new active messaging library, and by reengineering graph processing so active messaging runs entirely on GPUs. A custom framework using InfiniBand GPUDirect Async and the NVSHMEM parallel programming interface enabled GPU-to-GPU active messages and large-scale message aggregation, the company added.
The company illustrated the scale by saying a global social graph of 1.2 trillion edges could be searched in about three milliseconds. It said the win shows the NVIDIA computing platform can expand access to acceleration for very large sparse and irregular workloads and that developers can use NVSHMEM and IBGDA to scale large HPC applications on commercially available infrastructure.
The Recap
- NVIDIA reached 410 trillion traversed edges per second on Graph500.
- Run used 8,192 H100 GPUs on CoreWeave cluster.
- Result processed a graph of 2.2 trillion vertices.