- Test-time scaling has taken the AI market by storm over the last few months
- The technique can help make AI smarter by selectively applying compute resources when they're needed
- This could aid AI proliferation
Much like Roy Kent on the pitch (oh hey, Ted Lasso fans), test-time scaling is here, it’s there, it’s every freaking where. But what exactly is the hot new commodity that has taken the artificial intelligence (AI) market by storm? And does it even matter?
The short answer to the latter is a resounding yes. In a wildly oversimplified nutshell, test-time scaling (also known as inference-time scaling) will make AI models more powerful and allow them to answer even more complex questions. That’s why OpenAI, Nvidia, Google and DeepSeek have all been chasing test-time scaling advancements recently.
But how exactly does it work? Well, according to Dave Salvator, Nvidia director of accelerated computing products, test-time scaling “involves applying additional compute at the time of inference to allow models to ‘think longer’ by generating additional reasoning tokens. These allow models to provide more accurate answers, particularly for problems that require more sophisticated reasoning.”
Think of it as the ultimate power-up in a video game.
It used to be that developers beefed up their models by adding more tokens and parameters in pre-training. Now, there are two more tools at their disposal, Salvator said: simple post-training, which involves adding data inputs, and test-time scaling. Nvidia CEO Jensen Huang talked about these three scaling vectors at CES 2025 last month.
Put it all together and “the combination of pre-training scaling, post-training, and inference time scaling provides many ways to scale to new levels of intelligence,” Salvator explained.
Great, but what does this mean on a practical level?
Well, as researchers from Google and the University of California, Berkeley, noted in a recent paper, “if pre-trained model size can be traded off for additional computation during inference, this would enable LLM deployment in use-cases where smaller on-device models could be used in place of datacenter scale LLMs.”
The read through can basically be summed up as AI everywhere.
Seems Gartner VP Analyst Chirag Dekate wasn’t kidding when he recently told Fierce “I would argue that the next phase of AI growth has just started.”