Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

# Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention ###### tags: `筆記`, `study notes`, `NLP` > https://arxiv.org/pdf/2404.07143.pdf ## Abstract - Motivation: Address the scalability challenges of Transformer-based Large Language Models (LLMs) to handle infinitely long inputs using bounded memory and computation. - This work introduces Infini-attention, which integrates a compressive memory into the standard attention mechanism, supporting both local and long-term linear attention in a single Transformer block. ## Introduction - Highlights the limitations of current Transformers in handling long-context sequences due to the quadratic complexity of the attention mechanism. - Introduces Infini-attention as a solution to efficiently process infinitely long contexts with minimal memory overhead. ## Methodology - Infini-attention - ![image](https://hackmd.io/_uploads/HJhEvbKg0.png) - Describes the structure and function of Infini-attention which incorporates compressive memory to handle infinitely long input sequences effectively within a bounded memory footprint. ## Experiments - Datasets: Utilized long-context language modeling benchmarks, 1M sequence length passkey context block retrieval, and 500K length book summarization tasks. - Metrics: Demonstrated improvements over baseline models, achieving state-of-the-art results in memory efficiency and task performance. ## Takeaways - Infini-attention 顯著提升了長文本處理的效能與記憶體使用效率。 - 藉由不斷的預訓練與細微調整，模型能有效應對長達一百萬個字元的輸入。 - 實驗結果證實了模型在長文本語言建模與書籍摘要生成任務上的卓越性能。 > STATEMENT: The contents shared herein are quoted verbatim from the original author and are intended solely for personal note-taking and reference purposes following a thorough reading. Any interpretation or annotation provided is strictly personal and does not claim to reflect the author's intended meaning or context.