# Generate Rather Than Retrieve: Large Language Models are Strong Context Generators
###### tags: `筆記`, `NLP`, `ICLR 2023`
> https://arxiv.org/pdf/2209.10063.pdf
## Abstract
- Motivation: Tackling knowledge-intensive tasks like open-domain QA, which require vast world or domain knowledge, by generating contextual documents using large language models (LMs) instead of retrieving documents from external corpora.
- This study introduces generate-then-read (GENREAD) as a novel approach, outperforming the state-of-the-art retrieve-then-read pipeline in accuracy without external document retrieval.
## Introduction
- Knowledge-intensive tasks challenge even humans without access to external knowledge sources like Wikipedia.
- The proposed method leverages large LMs like InstructGPT to generate contextual documents for a given question, showing that these generated documents often contain more accurate and relevant information than retrieved documents.
## Methodology
- Generate-then-Read (GENREAD): A method where a large LM first generates contextual documents based on a question, which are then used to predict the final answer.
- Clustering-based Prompts: A novel prompting method to generate diverse documents covering different perspectives, leading to improved recall over acceptable answers.
- 
## Experiments
- Datasets: Three knowledge-intensive tasks were used for experiments, including open-domain QA, fact-checking, and dialogue systems.
- Open-domain QA: **TriviaQA**, **WebQ**
- Fact-checking: FEVER, FM2
- Dialogue system: [WoW](https://parl.ai/projects/wizard_of_wikipedia/)
- Metrics: GENREAD achieved significant improvements, with exact match scores of 71.6 and 54.4 on TriviaQA and WebQ, respectively, outperforming existing retrieve-then-read methods.
## Takeaways
- 提出的生成然後閱讀(GENREAD)方法為知識密集型任務提供了一種新的視角,通過利用大型語言模型直接生成上下文文件,而不是從外部資源中檢索文件。
- GENREAD 方法能夠在不需要任何外部知識來源的情況下,實現或超越最先進的檢索-閱讀管道方法的性能。
- 透過聚類基礎的提示方法,我們能夠生成涵蓋問題不同視角的多樣化文件,從而改善了生成文件的召回性能。
> The contents shared herein are quoted verbatim from the original author and are intended solely for personal note-taking and reference purposes following a thorough reading. Any interpretation or annotation provided is strictly personal and does not claim to reflect the author's intended meaning or context.