RAG | Loong's Lens

Rewrite-Retrieve-Read

This work introduces a new framework, Rewrite-Retrieve-Read¹ instead of the previous retrieve-then-read for the retrieval-augmented LLMs from the perspective of the query rewriting. In this framework, a small language model is adopted as a trainable rewriter to cater to the downstream LLM.

Figure 1. Overview of proposed pipeline. (a) standard retrieve-then-read method. (b) LLM as a query rewriter. (c) pipeline with a trainable writer. (Image source: (Query Rewriting for Retrieval-Augmented Large Language Models))

From this figure, complex queries can be split into many sub-queries, which benefits retriever to recall precise contexts more efficiently. In practice, the authors use reinforcement learning to train the rewriter, which is undoubtedly a costly process.

EfficientRAG

Standard RAG struggle to handle complex questions like multi-hop queries. In this paper, the authors introduce EfficientRAG, which iteratively generates new queries

RankRAG

LLMs are not good at reading too many chunked contexts (e.g., top-100) even with the long-context window. RankRAG aims to design an RAG instruction tuning pipeline that uses a single language model to achieve both high-recall context extraction and high-quality content generation. It is the most significant achievement that both context ranking and answer generation are considered in this framework.

stage I: Supervised Fine-Tuning (SFT) The authors use 128K SFT examples (e.g., OpenAssistant, Dolly, SODA, ELI5, Self-Istruct, Unnatural Instructions) in total, and take the multi-turn conversion format, use the previous turns of conversation between user and assistant as the context, and only compute the loss at the last response from the assistant.
stage II: Unified Instruction-Tuning for Ranking and Generation the stage II consists of these following parts:
1. SFT data from Stage-I: need to maintain the capability of following instruction.
2. Context-rich QA data: i) standard QA and reading comprehension dataset: DROP, NarrativeQA, Quoref, ROPES, NewsQA, TAT-QA. ii) conversational QA datasets: HumanAnnotatedConvQA, SyntheticConvQA.
3. Retrieval-augmented QA data: SQuAD, WebQuestions. In these two datasets, not all the retrieved contexts contain the answer, thus they can be thought of as involving ‘hard-negative’ contexts.
4. Context ranking data: MS MARCO passage ranking dataset.
5. Retrieval-augmented ranking data:SQuAD, WebQuestions. For each example, combine a gold context with the other retrieved contexts using BM25. LLM is trained to explicitly identify all relevant contexts for the question. Finally, all the above data will be cast into a standardized QA form ($x$, $c$, $y$), where $x$ is question, $c$ is the corresponding context, and $y$ is the target output answer.
  
  Figure 2. The converting form of the standardlized QA form from question, context and answer. (Image source: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs)
  
  RankRAG Inference: Retrieve-Rerank-Generate Pipeline The process of RankRAG can be described as follows: 1) the retriever $\mathcal{R}$ retrieves top-$N$ contexts from the knowledge base. 2) the RankRAG model calculates the relevant score between the quetion and retrieved $N$ contexts and only retains the top-$k$ contexts. 3) The top-$k$ contexts, along with the question, are integrated into a long prompt and fed into the RankRAG model to generate the final answer.

Ma et al., Query Rewriting for Retrieval-Augmented Large Language Models, 2023 ↩︎

Rewrite-Retrieve-Read#

EfficientRAG#

RankRAG#

Rewrite-Retrieve-Read

EfficientRAG

RankRAG