Tiny AI is poised to do big things

  • LLMs dominate the news but tiny models pack a punch
  • There are tiny models and then even smaller teeny models for on-device applications
  • Tiny models are proliferating in the shadow of LLMs while teeny models are tied to hardware cycles

Large language models (LLMs) and artificial general intelligence (AGI) may be more sexy, but tiny AI is poised to do an increasing amount of heavy lifting behind the scenes.

“Are they a big deal? I think yes and there’s not enough attention paid to them,” David Cox, IBM’s VP for AI Models, said of tiny models in an interview with Fierce. “It’s less glamorous in some ways than the AGI race, but it’s very exciting if you’re actually trying to deploy AI.”

While LLMs act as Swiss army knives with a wide range of knowledge and functionality, their capabilities are relatively shallow compared to tiny, specialized model. In practical terms, that means tiny AI models can do specific repetitive tasks faster, cheaper and more reliable than their larger counterparts.

For enterprises – including telcos – that means helping with everything from crunching receipts and writing repetitive bits of code to forecasting supply chain issues and maintenance needs. Within telco networks, tiny models can also enable more sophisticated operations closer to the edge of the network as well as on-device intelligence, cyber threat detection and infrastructure monitoring, Cox and AIZip CEO Yubei Chen said.

The latter two “need to be real-time so it’s much better to use small models running on-device. This means that model efficiency really matters,” Chen said.

Teeny vs. tiny models

Like “the edge,” tiny AI means different things to different people. And in the AI world, size is usually measured in parameters.

Generally speaking, LLMs can have 50+ billion parameters, while small language models are in the realm of 10 billion to 30 billion parameters. IBM’s Granite model provides even smaller options, in the few hundred million to 8 billion range.

Then you have teeny models (TM, we’re calling it), like those built by startups like AIZip, which can be as minuscule as 8,000 parameters (no, we’re not missing zeroes).

As you might guess from so broad a size range, capabilities can vary widely. We’re over the toolbox analogy, so think of it like a music collection. There are different tracks that you play when you are in different moods or doing various tasks.

Sure, you could play smooth jazz all the time like the Weather Channel used to, but it’s probably not the best choice for when you’re angry or want an upbeat tune to dance to while cleaning the house (we know we’re not the only ones).

Ditto with AI – there’s a spectrum and the key is matching the model with the task and environment. That’s where teeny and tiny models – which are either built from the ground up or distilled from larger models – come in.

AI matchmaking

So, how do you choose?

“For things that are operating in real time…you might as well just zero in on just the one thing you need and get as tight as possible,” Cox said. “But for other things that might require a bit of reasoning, you now have a spectrum of different capabilities.”

For very simple, repetitive tasks that demand real-time outputs in a very constrained environment (think car backup cameras and other endpoint and on-device AI), teeny models like those offered by AIZip and others are a great choice. These models are built to do just a few tasks but do them very, very well with few resources.

AIZip CEO Yubei Chen told Fierce that to get its models as small as they are, the company builds from the ground up in order to optimize the teeny model’s architecture and carefully curates each model’s inputs. The company also employs a variety of techniques like quantization, pruning and hardware-based optimization.


Tiny AI terms to know

  • Quantization: Reducing the amount of data stored for each parameter to make an AI model smaller and more efficient
  • Distillation: The process through which key knowledge from a larger model is sifted out and transferred to a smaller model
  • Pruning: The removal of model weights or connections that are almost never used as a means to reduce size without impacting performance

Cox noted privacy and security are some of the key advantages of teeny on-device edge AI models. After all, if you can do the computing on the device without sending sensitive information over the internet, that’s a win.

Then there’s the kind of AI IBM is working on in the form of Granite and its tiny time-series options (which we covered here). These are great for slightly more complex tasks – for instance, an HR chatbot – that need a bit more functionality but doesn’t need to also know physics as an LLM would, Cox said. 

Cox added there’s a new option on the table that allows AI consumers to “have their cake and eat it too”: activated low-rank adapters (aLoRAs) for its sub-10 million parameter Granite family. Introduced by IBM a few weeks back, these are basically specialist add-ons that can be dynamically be attached and detached from a general model as needed to boost specialist knowledge without bogging down performance.

Tiny AI now or later?

When will tiny AI make its big impact? Depends on who you ask and which kind of tiny AI you’re talking about.

Chen noted that “smaller, task-specific models are already reaching devices and markets globally.” But proliferation of teeny models will take some time since being built into devices means “the time to market follows hardware timelines.”  Given handset replacement cycles have been lengthening – and might further in the U.S. thanks to tariff-related price increases – it could take some more time before we feel the full impact.

As for the tiny models? Those are already in play and growing in the shadow of LLMs and AGI.

“Some people are just going straight to ‘oh giant models can solve all the problems,’ but it really can’t be very good at everything and it’s going to be expensive and be very slow for many of the things you want,” Cox concluded. “A lot of the tiny LLM world, that’s a huge rich space.”