What is RAG? Hint: It has to do with AI and data.

  • Fierce Network reached out to Gartner analyst and AI expert Bern Elliot to answer some key questions we had about the origins of RAG
  • RAG is a grounding technique popularized by Meta AI researchers to improve the quality and relevance of AI-generated content
  • Elliot said the RAG story begins with transformer algorithms

Retrieval-augmented generation (RAG) is a grounding technique popularized by Meta AI researchers to improve the quality and relevance of AI-generated content by allowing large language models (LLMs) to access external knowledge sources other than the model's training data.

In artificial intelligence (AI), grounding is the ability to connect model output to various external sources of information; by providing models with specific data sources, a grounded model will be able to consult with relevant information in the same context as its user and offer more accurate results.

Where did RAG come from?

Fierce Network reached out to Gartner analyst and AI expert Bern Elliot to answer some key questions we had about the origins of RAG.

Despite the surprisingly long history surrounding large language models, from what he understands, the story begins with transformer algorithms.

“There was this thing that came along, transformer algorithms, that you could use to build something called foundation models. Now this technology — the model — had been around for a while, they just weren’t very useful because they were too big. But as you know, compute power has gotten much bigger recently," he said.

"The ability to store things in the cloud and increases in processing speeds have made these algorithms, which were previously too big and cumbersome, possible. And the industry quickly discovered that if you do these kinds of models and make them really big, they come out with all kinds of interesting and useful properties," he explained, before noting that a lot of the topics surrounding RAG were brought to the industry’s attention long before they were packaged for the market.

“Some of these topics have been around for 40 years. So, probabilistic reasoning has been around for 40 years, but some of it is newer. Computational logic, optimization techniques, and natural language were around for a long, long time. First it used to be rule based, then it started being statistical, and now they’re actually using some large language models to help with natural language processing. Like they do with chatbots.” Elliot summarized.

How does RAG work?

RAG model architectures compare user-submitted queries within a database's knowledge library. An embedding language model is used to assign numerical representations to these searches within the vector database. The integrated prompt is then fed relevant information via a user-selected data repository. This augmented prompt is sent to the AI model and its training data, now with the context required to generate an actionable recommendation.

The difference between RAG and fine-tuning

Most firms do not train their own AI models, preferring to modify pre-trained models via approaches such as RAG and fine-tuning. Organizations that are unfamiliar with these terms risk becoming confused, thus it is critical to understand how the two techniques vary. Fine-tuning is the process of methodically tweaking a model's weights until the AI excels at a certain activity using its training data. RAG, on the other hand, obtains and gathers information from multiple sources to contextualize the user's request and produce more relevant results.

Why is RAG important?

LLMs are trained on massive amounts of data and countless parameters to respond to a wide variety of human questions. However, LLMs can be inconsistent. Sometimes they get the answer right, but every now and then, a confused LLM may stitch together nonsensical information from its training data, essentially misleading its user.

This is a rather typical AI criticism, particularly among businesses that rely on AI for authoritative replies with citations. But the fact with LLMs is that, while they understand how words interact statistically, they have no idea what they mean.

RAG allows you to optimize the output of an LLM with targeted information that may be more current than the LLM's training data or even specific to a company or industry. Furthermore, because RAG models link with credible sources of information, consumers may independently verify any claims made by the AI.

One of the key benefits of RAG is that it is quite simple to deploy, making it faster and less expensive than retraining a model from scratch with new datasets. This makes RAG ideal for toggling between different sources of information as needed.

Some of the top AI offerings that include RAG are Microsoft Azure Machine Learning, OpenAI's ChatGPT Retrieval Plugin, HuggingFace Transformer Plugin, IBM Watsonx.ai and Meta AI.


Read more articles about AI here.