-
Blackwell is Nvidia’s new top-tier GPU for AI
-
Nvidia unveiled two new DGX supercomputer systems for Blackwell that are ready to plop into data centers
-
Hyperscalers are racing to integrate the new chips into their cloud services
To power the artificial intelligence (AI) revolution, we’re going to need a bigger boat. Or rather, bigger GPUs, Nvidia CEO Jensen Huang said during the company’s GTC conference this week. And the company is happy to oblige.
Nvidia officially took the wraps off its B200 GPU, a silicon behemoth known as Blackwell. Seriously, Huang held it up next to the company’s H100 chip on stage and let’s just say Blackwell is one chonky boi.
Packed inside are 208 billion transistors that deliver 20 petaflops of computing horsepower for AI. It is not only more powerful than the H100 chip but also more efficient, reducing energy consumption by up to 25x.
Huang said with Nvidia’s Hopper chip, it would take 8,000 GPUs and 15 megawatts of power to train a GPT model over the course of about 90 days. Blackwell can do the same with 2,000 GPUs and 4 megawatts of power in the same amount of time.
That’s impressive enough in the abstract, but what will these chips look like in the data center? Well, Nvidia has the answer for that, too.
DGX supercomputers
Nvidia pitched two new versions of its DGX supercomputer system: the liquid-cooled SuperPOD based on the combination of Nvidia’s Blackwell and Grace chips (aka the GB200 superchip) and a bite-sized air-cooled option called the DGX B200 system.
Dell’Oro Group Research Director Lucas Beran told Silverlinings the former is huge news for the liquid cooling industry. (No pun intended.)
He noted Nvidia wants to keep offering air-cooled products because liquid cooling systems are more complex and more end users are comfortable with air cooling.
“The fact that they looked at this solution from a holistic perspective, said ‘hey, we can’t really do this at scale with air in any kind of sustainable and efficient manner’ and set it up to be a liquid-cooled solution really confirms the pivot point in the industry,” Beran said.
“If you’re going to be deploying accelerated computing you should be deploying liquid cooling and if you’re not you need to be planning to deploy liquid cooling in the future if you want to continue to deploy the latest and greatest accelerated infrastructure.”
Each rack within the SuperPOD contains 36 GB200 superchips and consumes 120 kilowatts of power. The liquid cooling system pumps in fluid at 25-degrees Celsius at a rate of two liters per second. That fluid absorbs heat and exits at 45-degrees Celsius, Huang said. Multiple racks can be connected for even more muscle.
The smaller DGX B200 system comprises eight Blackwell GPUs and two Intel Xeon processors in a rack-mounted air-cooled design.
Both systems are expected to be available later this year.
Hyperscaler frenzy
As with the H100 chip, hyperscale cloud players rushed to announce their plans to bring Blackwell’s capabilities to market.
AWS said it will be offering Blackwell-based Amazon EC2 instances as well as a Nvidia’s DGX Cloud supercomputing service to help customers run multi-trillion parameter large language models. It is also teaming with Nvidia to build a supercomputer comprised of 20,736 GB200 chips for Nvidia’s own R&D use as part of Project Ceiba.
Google Cloud, Microsoft and Oracle are likewise bringing Blackwell’s capabilities and DGX Cloud to market. The latter is also working with Nvidia on sovereign AI services.
To the extent that any of the hyperscalers deploy DGX SuperPOD racks, Beran said he expects they would be installed in “facilities that are being build which are planned for liquid.”