Maximizing Business Insights with Retrieval Augmented Generation
In 2023, Generative AI applications like ChatGPT resonated through professional circles as well as the public like it had already won the competition for the buzzword of the year. There is almost unanimous consent that Gen AI potentially opens up a plethora of business opportunities. However, it’s important to note that Gen AI has been known to exhibit highly realistic hallucinated data in its responses when it is not accompanied by factual context. Integration with existing enterprise data can mitigate these hallucinations by providing the necessary context for factual question answering.
Artificial Intelligence (AI) is the current growth engine within the software industry. The example of Microsoft’s 2023 third quarter statistics illustrates this fact perfectly: its cloud computing division, where most of the AI-business takes place, reported a 3 percentage point boost to its cloud business, with 19 percent increase in revenue compared to the same period in the prior year, which amounts to a total revenue of USD 24.26 billion in Q3. This is factual proof of the trend to try solving almost any problem by employing AI, a notion easily gained when looking at the business news headlines of nearly every day in 2023. Most of the space is taken up by applications categorized as generative AI (Gen AI), with ChatGPT as the prime example. Gen AI models learn the patterns and structure from an immense volume of data they have been exposed to. As a result, Gen AI models excel in offering coherent and nuanced responses, enabling them to generate new text, images, or even films, based on learned patterns. However, this extensive training also makes it susceptible to instances of hallucinations when asked to answer factual questions.
Can Gen AI do it all?
What could be more natural for any CIO than to take a closer look and try to integrate Gen AI into their own IT landscape? Correctly deployed, there are countless benefits to be gained, no doubt about that. But, similarly to traditional AI applications, Gen AI will only live up to its expectations if reliable and relevant data sources or contexts are available to ensure quality outcomes.
Adopting LLMs – solutions of varying complexity and benefits
Gen AI models like ChatGPT are pre-trained models, with which we interact by providing input text or images. To change the outcome or behavior of the model without re-training, we can change the input submitted to the model when generating an answer. This is called prompt engineering. To enable answering factual questions, we could insert all relevant documents. However, we would quickly hit limitations imposed by a maximum context window size that models support. Obviously, this would not be enough for an enterprise with terabytes of relevant data even if using GPT-4’s latest 128k context window, which amounts to roughly 200 single-space pages. Retrieval Augmented Generation (RAG) solves this problem by loading only the relevant pieces of information needed to answer the question or fulfil the task. Finally, finetuning allows the model to learn new capabilities to solve complex tasks for which the pre-training wasn’t sufficient. An alternative solution for adapting an LLM to enterprise data is to train a model from scratch. This, however, requires enormous amounts of high-quality data, computing power, and time, while not necessarily yielding better results than finetuning a pretrained model. Most existing LLMs are trained on large and diverse corpora of text, such as the Common Crawl or Wikipedia, which capture general language patterns and knowledge.
Demystifying Retrieval Augmented Generation
Upgrading an LLM by using a RAG approach divides the process for reaching an answer to a given problem in two steps, indexing and retrieval. When working with enterprise documents, they can be further broken down into the following:
Indexing
- Document ingestion: Parsing e.g., pdf, HTML files containing text, tables, diagrams, images, audio, and video elements
- Document chunking: Splitting the captured document into smaller files with specific content, and generating related metadata
- Document embedding: Generating a unique vector, representing its content for each chunk that positions it in a large semantic space; storing the vector in a database and indexing that
Retrieval
- Document retrieval: Retrieving content related to the prompt. This may involve retrieving all chunks that are semantically related to the prompt, as well as surrounding chunks from the original document that may provide context.
- Metadata is also retrieved, which, amongst other things, makes it possible to provide a link to source documents with the response
- Document ranking: Selecting which chunks are most relevant to the prompt and inserting them in the context window
Benefits of RAG
Enriching the prompt with context that is relevant for the question at hand is proven to yield informed responses, in particular when the provided context is focused and specific. Recent tests have indicated that a smaller context is more likely to provide adequate results. Observations have also shown that the position of the specific data within the context has an impact: The LLM is more likely to retrieve data that sits at the beginning or towards the end of the context. Done right, RAG is an effective and flexible method to provide enterprise-specific context to applications using GenAI.
Beware of quality and diversity problems
Of course, there are some drawbacks, especially regarding the contents of the provided documents. RAG depends on the quality and coverage of the text corpus used for retrieval. If it is incomplete, outdated, or biased, RAG may not be able to find relevant documents or passages for some queries, or it may retrieve incorrect or misleading information. If the documents are of greatly varying quality or from different sources and perspectives, the generated answers may be inconsistent with, or even contradictory to, the retrieved documents or passages.
The generated answers might even not be supported by any evidence from the retrieved sources, or they might be based on weak or vague connections. Difficulties can also arise with queries that are ambiguous, vague, or open-ended. Therefore, it is important to systematically evaluate RAG’s performance on different domains and tasks, and to provide feedback mechanisms for users to verify and correct if needed.
The evolving RAG architecture
The basic RAG architecture is relatively straightforward. The information to be used is contained within the documents or records stored in existing corporate data sources. It is then indexed and retrieved in a structured manner as needed. Each of these processes, however, are critical to success. Indexing, for example, essentially maps out a vast sea of data into a navigable landscape. When we delve deeper into the intricacies of advanced RAG applications, more sophisticated approaches emerge, and therein lies its profound potential. We observe emerging trends like re-ranking, multi-query, multi-vector and small-to-big retrieval strategies that aim to further refine the process.
Evaluation and monitoring of RAG applications
When deploying RAG applications, it is critical to evaluate the performance of the solution with regard to the business needs. This should be commonly done for any digital application. It involves assessing both the functional and non-functional features. The non-deterministic nature of LLMs, as well as known limitations and threats, require specific attention.
Amongst other things, these specific attention points may relate to:
- Systematically evaluating the results concerning content and style of responses from the LLM
- Checking guardrails, for example to ensure that the level of bias and hallucinations is acceptable
- Providing explanations for the results, e.g., in the form of detailed steps or referencing of the original data sources
- Ensuring an adequate response time and a generally satisfactory user experience overall
- Preventing the risk of a new type of cyber-attacks, such as prompt injection or data poisoning
- Keeping the cost of inference at an acceptable level
Such features must be monitored over time, as LLMs and data source evolutions will impact them.
Looking ahead
RAG is an effective but very recent architecture pattern and expected to improve in the coming quarters. Examples of evolving innovation areas are the indexation of multi-modal and complex documents, e.g., taking advantage of the ability of LLMs to interpret images and the ability to retrieve concise and relevant data at the right time during the construction of the response. Recent industry announcements like the release of GPTs and GPT store by OpenAI start to give some reality to AI agents. This concept has been on the agenda for many years.
Agents categorized as Generative Agents can perform complex tasks, e.g., outcome-driven workflows, by combining LLMs with memory, planning capabilities, reflection mechanisms, opportunities for exchange between agents, and with orchestration applications. It also includes accessing external APIs or tools to retrieve data or trigger transactions. In the midst of a rapid GenAI innovation wave, it is imperative for CIOs to be aware of the developments in RAG-related solutions and agent applications. LLM-powered agents have an immense potential in scientific discovery, problem resolution, and autonomous decision-making. They enable the handling of intricate tasks and the generation of innovative solutions by leveraging the synergies of LLMs and supplementary components.