Modular RAG – Three-layer structure of module types, modules, and operators.

The RAG flow encompasses the entire working process within the RAG system, from the input query to the output generation of text. This process typically involves the coordinated work of multiple modules and operators, including retrievers, generators, and possible pre-processing and post-processing modules. The design of the RAG flow aims to enable Large Language Models (LLMs) to utilize external knowledge bases or document sets when generating text, thereby improving the accuracy and relevance of responses.

The process of RAG inference generally follows these patterns:

Sequential: Linear process, including both advanced and simple RAG paradigms
Conditional: Choosing different RAG paths based on query keywords or semantics
Branching: Including multiple parallel branches, divided into pre-retrieval and post-retrieval branches
Loop: Including iterative, recursive, and adaptive retrieval structures

Below is a flowchart of the Loop mode in RAG:

Basic RAG

The input module receives the user’s query, the retrieval module retrieves relevant documents from the knowledge base, and the output module generates answers based on the retrieval results.

Adding a Reranker Module

Basic RAG to reorder the retrieval results.

Adding a Query Rewrite Module

Added a reranker module to the query pipeline, which is a post-processing operation on the retrieval results. Now, we will add a query rewrite module to perform pre-processing on the query.

Sentence Window Retrieval

The process of sentence window retrieval works as follows: When a document is segmented, it is split into sentences and stored in a database. When retrieving relevant sentences, the retrieved sentences alone are not considered as the retrieval result. Instead, the sentences before and after the retrieved sentence are also included. The number of sentences included can be specified using parameters. Finally, the retrieval results are sent to the LLM to generate the answer.

Evaluation Module

Ragas is a framework used to evaluate RAG applications, providing numerous detailed evaluation metrics.