Rewrite-Retrieve-Read

This work introduces a new framework, Rewrite-Retrieve-Read1 instead of the previous retrieve-then-read for the retrieval-augmented LLMs from the perspective of the query rewriting. In this framework, a small language model is adopted as a trainable rewriter to cater to the downstream LLM.

Figure 1. Overview of proposed pipeline. (a) standard retrieve-then-read method. (b) LLM as a query rewriter. (c) pipeline with a trainable writer. (Image source: (Query Rewriting for Retrieval-Augmented Large Language Models))

Figure 1. Overview of proposed pipeline. (a) standard retrieve-then-read method. (b) LLM as a query rewriter. (c) pipeline with a trainable writer. (Image source: (Query Rewriting for Retrieval-Augmented Large Language Models))

From this figure, complex queries can be split into many sub-queries, which benefits retriever to recall precise contexts more efficiently. In practice, the authors use reinforcement learning to train the rewriter, which is undoubtedly a costly process.

EfficientRAG

Standard RAG struggle to handle complex questions like multi-hop queries. In this paper, the authors introduce EfficientRAG, which iteratively generates new queries

RankRAG

LLMs are not good at reading too many chunked contexts (e.g., top-100) even with the long-context window. RankRAG aims to design an RAG instruction tuning pipeline that uses a single language model to achieve both high-recall context extraction and high-quality content generation. It is the most significant achievement that both context ranking and answer generation are considered in this framework.

Figure 1. The pipeline of RankRAG. (Image source: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs)

Figure 1. The pipeline of RankRAG. (Image source: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs)

  1. stage I: Supervised Fine-Tuning (SFT) The authors use 128K SFT examples (e.g., OpenAssistant, Dolly, SODA, ELI5, Self-Istruct, Unnatural Instructions) in total, and take the multi-turn conversion format, use the previous turns of conversation between user and assistant as the context, and only compute the loss at the last response from the assistant.
  2. stage II: Unified Instruction-Tuning for Ranking and Generation the stage II consists of these following parts:
    1. SFT data from Stage-I: need to maintain the capability of following instruction.
    2. Context-rich QA data: i) standard QA and reading comprehension dataset: DROP, NarrativeQA, Quoref, ROPES, NewsQA, TAT-QA. ii) conversational QA datasets: HumanAnnotatedConvQA, SyntheticConvQA.
    3. Retrieval-augmented QA data: SQuAD, WebQuestions. In these two datasets, not all the retrieved contexts contain the answer, thus they can be thought of as involving ‘hard-negative’ contexts.
    4. Context ranking data: MS MARCO passage ranking dataset.
    5. Retrieval-augmented ranking data:SQuAD, WebQuestions. For each example, combine a gold context with the other retrieved contexts using BM25. LLM is trained to explicitly identify all relevant contexts for the question. Finally, all the above data will be cast into a standardized QA form ($x$, $c$, $y$), where $x$ is question, $c$ is the corresponding context, and $y$ is the target output answer.
      Figure 2. The converting form of the standardlized QA form from question, context and answer. (Image source: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs)

      Figure 2. The converting form of the standardlized QA form from question, context and answer. (Image source: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs)

      RankRAG Inference: Retrieve-Rerank-Generate Pipeline The process of RankRAG can be described as follows: 1) the retriever $\mathcal{R}$ retrieves top-$N$ contexts from the knowledge base. 2) the RankRAG model calculates the relevant score between the quetion and retrieved $N$ contexts and only retains the top-$k$ contexts. 3) The top-$k$ contexts, along with the question, are integrated into a long prompt and fed into the RankRAG model to generate the final answer.