- DeepSeek's release of new highly efficient AI models has thrown the market into a frenzied panic
- Some are worried that the new models could be bad news for Nvidia and data center investments
- But analysts said that's not the case
Does size really matter? The eternal question reared its head in the artificial intelligence (AI) realm in recent weeks following the release of Chinese startup DeepSeek’s latest models (V3 and R1), which were reportedly trained using a much smaller compute cluster than one might expect.
According to DeepSeek, V3 was trained using just 2,048 NVIDIA H800 GPUs – though it’s worth noting there’s some skepticism floating around about just how accurate this figure is. Why does it even matter? Because AI giants like OpenAI, Meta and Anthropic use many more GPUs to train their models.
For instance, OpenAI’s GPT-4 was trained using an estimated 25,000 Nvidia GPUs while Meta used two 24,000-GPU clusters to train its Llama 3 model.
GPUs, of course, live in data centers. And as the AI wave builds, the race has been on to build bigger and better data centers in the U.S. and abroad to support the tech. But DeepSeek’s training efficiency raises the question of just how much concentrated compute is really needed.
So, we asked analysts: does the DeepSeek news change the game for data center builders?
“My sense of the aggregate compute demand remains the same or even goes up, but its distribution and density footprint could change,” Jason Andersen, VP and principal analyst at Moor Insights and Strategy, told Fierce.
“Companies will still need a lot of computing to train, drive and scale these diverse and interconnected models. In fact, it could even increase the demand if you consider that every time we see a technology get decentralized the market opportunity increases a lot (mainframe to PCs, PCs to mobile, servers to cloud, etc.),” he explained. “What could change, though, is whether they will need it all in the same places or more distributed locations.”
Gartner seemed to have a similar take.
In a research note published this week, the analyst firm argued DeepSeek-R1’s launch is “not proof that scaling models via additional compute and data doesn’t matter, but that it pays off to scale a more efficient model.”
Gartner VP Analyst Chirag Dekate told Fierce that the market panic over DeepSeek is due to a fundamental misunderstanding of what the Chinese team actually accomplished.
“They’re misconstruing it to interpret it as though capex intensive investments from the likes of Microsoft, Google, Amazon, Meta and the like are not effective utilization of underlying resources and nothing could be further from the truth,” Dekate argued.
One of DeepSeek’s key breakthroughs was around effective test time scaling, Dekate said. Test time scaling refers to a technique which allows a model to use more compute resources during inferencing, and it actually requires “more compute, not less.”
Additionally, DeepSeek isn’t exactly a frontier model like those from Anthropic, Google, Open AI and Mistral. There is no way to create one of those types of models without “exploiting leadership-class architectures” like those offered by Nvidia’s more advanced GPUs. And, of course, DeepSeek doesn’t have access to those due to U.S. efforts to keep such tech out of China’s hands.
“If you’re trying to do frontier model scaling and innovation, you will still need extensive scaled compute resources,” he said. Plus, as the market moves away from just text-only inferencing to video, audio and imagery, the compute intensity of inference workloads will further increase.
“So, the data center infrastructure impact continues to be rather stark,” Dekate concluded.