Depending on the hardware you're using, training a large language model of any significant size can take weeks, months, even years to complete. That's no way to do business — nobody has the electricity and time to be waiting that long. On Wednesday, NVIDIA unveiled the newest iteration of its Eos supercomputer, one powered by more than 10,000 H100 Tensor Core GPUs and capable of training a 175 billion-parameter GPT-3 model on 1 billion tokens in under four minutes. That's three times faster than the previous benchmark on the MLPerf AI industry standard, which NVIDIA set just six months ago.
Eos represents an enormous amount of compute. It leverages 10,752 GPUs strung together using NVIDIA's Infiniband networking (moving a petabyte of data a second) and 860 terabytes of high bandwidth memory (36PB/sec aggregate bandwidth and 1.1PB sec interconnected) to deliver 40 exaflops of AI processing power. The entire cloud architecture is comprised of 1344 nodes — individual servers that companies can rent access to for around $37,000 a month to expand their AI capabilities without building out their own infrastructure.
In all, NVIDIA set six records in nine benchmark tests: the 3.9 minute notch for GPT-3, a 2.5 minute mark to to train a Stable Diffusion model using 1,024 Hopper GPUs, a minute even to train DLRM, 55.2 seconds for RetinaNet, 46 seconds for 3D U-Net and the BERT-Large model required just 7.2 seconds to train.
NVIDIA was quick to note that the 175 billion parameter version of GPT-3 used in the benchmarking is not the full-sized iteration of the model (neither was the Stable Diffusion model). The larger GPT-3 offers around 3.7 trillion parameters and is just flat out too big and unwieldy for use as a benchmarking test. For example, it'd take 18 months to train it on the older A100 system with 512 GPUs — though, Eos needs just eight days.
So instead, NVIDIA and MLCommons, which administers the MLPerf standard, leverage a more compact version that uses 1 billion tokens (the smallest denominator unit of data that generative AI systems understand). This test uses a GPT-3 version with the same number of potential switches to flip (s the full-size (those 175 billion parameters), just a much more manageable data set to use in it (a billion tokens vs 3.7 trillion).
The impressive improvement in performance, granted, came from the fact that this recent round of tests employed 10,752 H100 GPUs compared to the 3,584 Hopper GPUs the company used in June's benchmarking trials. However NVIDIA explains that despite tripling the number of GPUs, it managed to maintain 2.8x scaling in performance — an 93 percent efficiency rate — through the generous use of software optimization.
"Scaling is a wonderful thing," Salvator said."But with scaling, you're talking about more infrastructure, which can also mean things like more cost. An efficiently scaled increase means users are "making the best use of your of your infrastructure so that you can basically just get your work done as fast [as possible] and get the most value out of the investment that your organization has made."
The chipmaker was not alone in its development efforts. Microsoft's Azure team submitted a similar 10,752 H100 GPU system for this round of benchmarking, and achieved results within two percent of NVIDIA's.
"[The Azure team have] been able to achieve a performance that's on par with the Eos supercomputer," Dave Salvator Director of Accelerated Computing Products at NVIDIA, told reporters during a Tuesday prebrief. What's more "they are using Infiniband, but this is a commercially available instance. This isn't some pristine laboratory system that will never have actual customers seeing the benefit of it. This is the actual instance that Azure makes available to its customers."
NVIDIA plans to apply these expanded compute abilities to a variety of tasks, including the company's ongoing work in foundational model development, AI-assisted GPU design, neural rendering, multimodal generative AI and autonomous driving systems.
"Any good benchmark looking to maintain its market relevance has to continually update the workloads it's going to throw at the hardware to best reflect the market it's looking to serve," Salvator said, noting that MLCommons has recently added an additional benchmark for testing model performance on Stable Diffusion tasks. "This is another exciting area of generative AI where we're seeing all sorts of things being created" — from programming code to discovering protein chains.
These benchmarks are important because, as Salvator points out, the current state of generative AI marketing can a bit of a "Wild West." The lack of stringent oversight and regulation means, "we sometimes see with certain AI performance claims where you're not quite sure about all the parameters that went into generating those particular claims." MLPerf provides the professional assurance that the benchmark numbers companies generate using its tests "were reviewed, vetted, in some cases even challenged or questioned by other members of the consortium," Salvator said. "It's that sort of peer reviewing process that really brings credibility to these results."
NVIDIA has been steadily focusing on its AI capabilities and applications in recent months. "We are at the iPhone moment for AI," CEO Jensen Huang said during his GTC keynote in March. At that time the company announced its DGX cloud system which portions out slivers of the supercomputer's processing power — specifically by either eight H100 or A100 chips running 60GB of VRAM (640 of memory in total). The company expanded its supercomputing portfolio with the release of DGX GH200 at Computex in May.