2024.03.24 - [GTC] Watch Party：RAG 检索增强生成：设计系统、数据和定制概述 [WP62744a]

2024.03.24 - [GTC] Watch Party：RAG 检索增强生成：设计系统、数据和定制概述 [WP62744a] === ###### tags: `會議` ###### tags: `會議`, `講座`, `Nvidia`, `GTC`, `RAG` > - Watch Party：RAG 检索增强生成：设计系统、数据和定制概述 [WP62744a] > https://register.nvidia.com/flow/nvidia/gtcs24/attendeeportaldigital/page/sessioncatalog/session/1706586013755001Sz9Z > - Retrieval Augmented Generation: Overview of Design Systems, Data, and Customization [S62744] > - https://register.nvidia.com/flow/nvidia/gtcs24/attendeeportaldigital/page/sessioncatalog/session/1697796256485001xubl > - S62744 - Retrieval Augmented Generation. Overview of Design Systems, Data, and Customization.pdf [TOC] :::warning ### :bulb: Intro GTC Watch Party 是由 NVIDIA 本地专家主持，以中文讲解，带领参与者同步观看某一精选演讲并解读和实时答疑的会议形式。这是一个互动式的会议，我们鼓励您在聊天区参与讨论，提出见解或问题。但请注意，下方列出的原始演讲者可能不会出席本会议。 **会议主持人：** **Yipeng Li**, Integrated Marketing Manager, NVIDIA Discover the potential of retrieval augmented generation (RAG) with NVIDIA technologies. RAG systems combine information retrieval and generative models by retrieving relevant document passages from a large corpus, and then use them as context for generating detailed answers. We'll cover the design of end-to-end RAG systems, including data preparation and retriever and generator models. We'll showcase an example of RAG system using NVIDIA TensorRT-LLM and NeMo. We'll cover RAG models evaluation and customization for specific tasks. - **Miguel Martinez**, Senior Deep Learning Data Scientist, NVIDIA - **Meriem Bendris**, Senior Deep Learning Data Scientist, NVIDIA - **Sergio Perez Perez,** Solution Architect, NVIDIA - **Dora Csillag**, Senior Solutions Architect - GenAI&Inference, NVIDIA **Industry**: All Industries ::: - LLMs are Powerful Tools but Not Accurate Enough for Enterprise - Risk of outdated information - Lacking proprietary knowledge - Hallucinations - notes - 大型語言模型的限制 - 完成預訓練後，其知識就開始過時 - Retrieval Augmented Generation (RAG) - 可訂製答案 - 可解決所有潛在幻覺問題？ - 顯著減少 - 動機與目的： - 基於知識侷限性，不更新就會造成知識過時 - 企業難以進行從頭到尾的訓線 - 改善：使用 pre-train model + 結合數據集或文檔，來做微調，解決業務需求 - LLM 過擬合/泛化不足/數據偏差/數據分配不均 -> 產生幻覺 -> 一本正經胡說八道 - 改善：RAG 是一個不錯的方法 - RAG 會去知識庫檢索，根據向量像似度(如餘弦相似度) - 生成答案 - 大模型充分利用外部知識庫，即融合上下文，來生成回答 - ChipNeMo Use Case - ChipNemo 是 Nvidia 用於解決企業問題的解決方案 - NeMo framework 是一個端到端的，用於去構建、訂製、佈署大模型，以及對話式ai模型的一個框架的集合，NeMo framework 是一個整個框架 - NeMo-Guardrails 是一個開源工具庫，主件主打的是安全。透過 NeMo-Guardrails 可以把可編程的護欄添加到大語言模型的對話系統中，控制大語言模型輸出的內容，讓它不要討論政治敏感的、不要討論違法違規的內容。 - 2023/4/26 - [NVIDIA推出NeMo Guardrails，防止AI聊天機器人再跟你「一本正經的胡說八道」](https://www.techbang.com/posts/105781-nvidia-nemo-guardrails) - NeMo Megatron 是做 NLP 大模型分佈式訓練的一個工具庫 - [NVIDIA AI 平台為大型語言模型 NeMo Megatron 框架帶來 30% 訓練速度提升](https://news.xfastest.com/nvidia/115750/nvidia-ai-nemo-megatron/#google_vignette) - 2022/7/31 - [NVIDIA AI 平台為大型語言模型 NeMo Megatron 框架帶來 30% 訓練速度提升](https://news.xfastest.com/nvidia/115750/nvidia-ai-nemo-megatron/) <hr> ## Slide content ### Slide3 - LLMs are Powerful Tools but Not Accurate Enough for Enterprise LLM 是強大的工具，但對於企業來說不夠準確 - Without a connection to enterprise data sources, LLMs cannot provide accurate information 如果沒有連接到企業數據源，LLM 無法提供準確的信息 - Risk of outdated information 資訊過時的風險 LLM 訓練完當下，資訊就開始過時 - Lacking proprietary knowledge 缺乏專有知識企業內部的知識、文件，沒有參與訓練 - Hallucinations 產生幻覺 --- ### Slide4 - **Think of RAG as a LLM customization** 將 RAG 看作是 LLM 的客製化 - **Custom models tailor language processing capabilities to specific use cases and domain knowledge** 自定義模型根據特定用例和領域知識量身定做語言處理能力 --- ### Slide7 **ChipNeMo - An example of RAG within NVIDIA** ChipNeMo - NVIDIA 內部的 RAG 實例 - **ChipNeMo explores the applications of large language models (LLMs) for industrial chip design.** ChipNeMo 探索大型語言模型（LLM）在工業晶片設計上的應用。 - **This paper illustrates the use of diverse domain adaptation techniques, including:** 本文闡述了使用多種領域適應技術，包括： - **Custom tokenizers.** 自定義分詞器。 - **Domain-adaptive pretraining (DAPT).** 領域適應性預訓練（DAPT）。 - **Supervised fine-tuning (SFT).** 監督式細調（SFT）。 - **Domain-adapted retrieval augmented generation (RAG).** 領域適應的檢索增強生成（RAG）。 - **It evaluates the performance of these methods on three distinct LLM applications for chip design:** 評估這些方法在三種不同的 LLM 晶片設計應用上的效能： - **An engineering assistant chatbot.** 工程助理聊天機器人。 - **EDA script generation.** EDA 腳本生成。 - **Bug analysis and summarization.** 錯誤分析與總結。 - **It details a process portable to any industry.** 詳述一個可移植到任何行業的流程。 --- ### Slide10 **ChipNeMo - Adapting LLaMA2 Pretrained Tokenizer** 適應 LLaMA2 預訓練分詞器 - **Goals when adapting a pre-trained tokenizer:** 適應預訓練分詞器時的目標： - **Improving tokenization efficiency on domain-specific data.** 提高針對領域特定數據的分詞效率。 - **Maintaining the language model performance.** 維持語言模型的性能。 - **Customized tokenizers reduced DAPT token count by up to 3.3% without hurting its effectiveness on applications.** 客製化分詞器使得 DAPT 的 token 數量減少了最多 3.3%，同時不損害其在應用中的有效性。 - **Steps followed when adapting the LLaMA2 tokenizer:** 適應 LLaMA2 分詞器時遵循的步驟： - **Training a new tokenizer using domain-specific data.** 使用領域特定數據訓練一個新的分詞器。 - **From the vocabulary of the new tokenizer, identify the tokens that are missing in the LLaMA2 tokenizer.** 從新分詞器的詞彙中識別出 LLaMA2 分詞器中缺少的 token。 - **Expand LLaMA2 tokenizer with the new tokens.** 將新 token 擴充到 LLaMA2 分詞器中。 - **Initialize the embeddings of the new tokens by averaging the embeddings of the tokens generated by LLaMA2 tokenizer.** 通過平均 LLaMA2 分詞器生成的 token 的嵌入來初始化新 token 的嵌入。 - **In the transformer’s output layer, the weights corresponding to the new tokens are initialized to zero.** 在變壓器的輸出層中，新 token 對應的權重初始化為零。 --- ### Slide11 - Domain-Adaptive Pretraining exerts a substantial positive impact on tasks within the domain itself. 領域適應性預訓練（DAPT）對領域內任務產生了顯著的正面影響。 - It also exhibits a slight degradation in accuracy on open-domain academic benchmarks. 它在開放領域學術基準測試中也表現出輕微的精確度下降。 - The use of larger foundation models yields better zero-shot results on domain-specific tasks 使用更大的基礎模型在領域特定任務上提供了更好的零樣本（zero-shot）結果。 - DAPT accounts for less than 1.5% of the cost of pretraining a foundation model from scratch. DAPT 的成本不到從頭預訓練基礎模型成本的 1.5%。 - ChipNeMo 7B DAPT required 2620 GPU hours. ChipNeMo 7B 的 DAPT 需要 2620 GPU 小時。 - ChipNeMo 13B DAPT required 4940 GPU hours. ChipNeMo 13B 的 DAPT 需要 4940 GPU 小時。 - Pretraining run on 128 NVIDIA A100 GPUs, spread across 16 NVIDIA DGX A100 servers. 預訓練在 128 個 NVIDIA A100 GPU 上運行，這些 GPU 分布在 16 台 NVIDIA DGX A100 伺服器上。 - Other parameter-efficient fine-tuning (PEFT) methods such as LoRa adapters revealed a substantial accuracy gap on in-domain tasks when compared to DAPT. 其他如 LoRa 轉接器的參數高效細調（PEFT）方法與 DAPT 相比，在領域內任務上顯示了顯著的精確度差距。 <hr> ## 參考資料 - ### 2023/4/26 - [NVIDIA推出NeMo Guardrails，防止AI聊天機器人再跟你「一本正經的胡說八道」](https://www.techbang.com/posts/105781-nvidia-nemo-guardrails) ![](https://hackmd.io/_uploads/rJQ-hNzlR.png) - NeMo Guardrails功能一覽 - **主題護欄**：可以避免應用程式偏離到不想要的領域。例如，它們可以防止客戶服務助理回答關於天氣的問題。 - **安全護欄**：確保應用程式回覆準確、適當的資訊。它們可以過濾不必要的語言，並強制要求只引用可信的來源。 - **保全護欄**：限制應用程式僅與已知為安全的外部連結建立連接。 - ### 2023/4/26 - [NVIDIA 開源軟體可協助開發人員在 AI 聊天機器人上增加護欄](https://blogs.nvidia.com.tw/blog/ai-chatbot-guardrails-nemo/) - ### 2022/7/31 - [NVIDIA AI 平台為大型語言模型 NeMo Megatron 框架帶來 30% 訓練速度提升](https://news.xfastest.com/nvidia/115750/nvidia-ai-nemo-megatron/) - ### 2022/7/31 - [NVIDIA AI 平台為大型語言模型 NeMo Megatron 框架帶來 30% 訓練速度提升](https://news.xfastest.com/nvidia/115750/nvidia-ai-nemo-megatron/) - NeMo Megatron 更新內容可加快 30% 的 GPT-3 模型訓練速度 - NeMo Megatron 是一個快速、高效且易用的端到端容器化框架，用於收集資料、訓練大型模型、按照業界標準基準評估模型，與以最先進的延遲與傳輸量表現進行推論。 <hr> ## 參考資料 - 未消化 - [Build Enterprise Retrieval-Augmented Generation Apps with NVIDIA Retrieval QA Embedding Model](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) - [RAG 101: Demystifying Retrieval-Augmented Generation Pipelines](https://developer.nvidia.com/blog/rag-101-demystifying-retrieval-augmented-generation-pipelines/) - [How to Take a RAG Application from Pilot to Production in Four Steps](https://developer.nvidia.com/blog/how-to-take-a-rag-application-from-pilot-to-production-in-four-steps/)