AI

Small language models could be the next GenAI golden child

  • Many businesses are wary of GenAI adoption due to concerns over transparency and reliability
  • Purpose-built models and small language models (SLMs) trained on controlled, industry-specific data might emerge as more reliable and accurate alternatives
  • SLMs offer advantages in efficiency, data privacy and security

Generative AI (GenAI) quickly climbed to fame thanks in large part to large language models (LLMs) like ChatGPT and Microsoft Copilot. However, growing concern over transparency and reliability is causing many businesses to slow their roll on GenAI adoption. The solution could lie in more specialized, purpose-built models, according to Girish Pai, SVP and global head of AI & data at Hexaware.

“Part of the issue is that the most successful LLMs that exist today have basically been trained on the public Internet, which is publicly available information. While a lot of effort has gone into cleaning it up, it is largely uncontrolled information,” Pai told Fierce Network. “I think the transparency problem will reduce when we build models that are built for a specific purpose and with highly controlled data sets.”

Many businesses pursuing GenAI solutions are doing so using third-party models like the ones mentioned above. That includes most telcos, noted Priya Mehra, director of consulting firm Altman Solon.

For many applications, those models are fine-tuned with private data and rule sets. “That's what a lot of the tier-one CSPs are doing,” Mehra said. “Most telcos are more interested in doing that, fine tuning a use case and building their own application.”

While many enterprises also rely on this fine-tuning approach, there is growing interest across verticals in developing SLMs for more focused applications. 

SLMs are designed with specific industry needs in mind, and these smaller, specialized models are trained exclusively on relevant data, making them more accurate for their intended purposes. 

An LLM internalizes the vast amounts of data that it has ingested and sometimes there’s a need to unlearn what a model might have been taught. Meanwhile, purpose-built models have only been trained on what they need to know.

For example, an industry-specific model in the healthcare could be trained on data, information and history from that space and will understand the domain far deeper than a general-purpose model. Pai sees this as the next evolution in AI.

“If you're trying to solve a problem, let's say, around marketing segmentation, you don't necessarily need to know what the terms of the Versailles Treaty were, but an LLM does know those things. So, it's about restricting the knowledge of the model to things that are only relevant and useful to where you're deploying that model,” Pai explained. “We’re talking about actually training a model from scratch.”

SLMs show promise

Armand Ruiz, VP of product for IBM’s AI Platform, has also championed the rise of SLMs, touting their efficiency and adaptability compared to larger LLMs. 

“I'm very excited about small, specialized models that can outperform large, generalist models,” Ruiz wrote on LinkedIn last month. He also highlighted SLMs’ advantages in terms of data privacy, security and intellectual property control—key factors for businesses concerned about the risks associated with third-party AI solutions.

Similarly, Dell Technologies’ Senior Principal Engineer Raed Hijer has found SLMs have shown a “relatively good performance when compared to LLMs for specific use-cases.” Hijer said SLMs like Llama3-8B and Mixtral 8x22B are showing promising results in certain areas, such as question answering, sentiment analysis and reasoning. 

“This suggests that other factors, besides the sheer size of the Language Model play important roles in its performance,” Hijer wrote in a blog for the company.

As more companies weigh the benefits and risks of GenAI, the AI industry is at a crossroads. In the coming years, the competition between large generalist models and small, specialized models is likely to intensify.

Today, the models that are “most effective, and that hallucinate the least, are the most common,” Pai said. “It’s OpenAI, and it's Anthropic, and there's a preference to work with those models and fine tune them, rather than looking at something that is new and only built for a purpose as of now. But I think that'll change.”