KSLab_M1
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Help
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # RGB 整理 <style> .red {color: red;} </style> <style> .orange {color: orange;} </style> [toc] ## RGB github link :::success https://github.com/chen700564/RGB ::: ## 4 abilities in RGB 1. **Noise Robustness** - Def : LLM 在有噪音文件的情況下正確回答的能力 - Evaluation Metrics : - accuracy : 用 EM 計算 2. **Negative Rejection** : - Def : LLM 在全部輸入都是噪音文件的情況下,拒絕回答的能力 - Evaluation Metric : - Rej : 採用 EM 計算的拒絕率,需要LLM生成的回答中含有 "沒有相關外部參考文件" 拒絕回答才算有正確拒絕 - Rej* : 利用 chatgpt 評估回答內容,只要有拒絕回答的涵義,就算有拒絕 3. **Information Integreation** - Def : 評估 LLM 整合多個外部文檔中的信息以回答複雜問題的能力。在某些情況下,回答一個問題可能需要從多個文檔中提取和整合信息。 - Evaluation Metrics: - accuracy : EM 比對生成的回答跟正確答案 4. **CounterFactual Robustness** - Def : LLM 在外部文檔是錯誤資訊的情況下,但內部文件是正確的情形下,辨識出錯誤資訊並修正的能力 - Evaluation Metrics : - accuracy : EM 比對 - ED : EM 比對 - ED* : Chatgpt 偵測語意,只要有指出 "外部參考文件是錯誤的" 的語意即可 - CR : EM 比對 :::success Note : - 噪音文件 : 與 query 相關但是不含正確答案的外部文件 - EM (Exact Match) - ED (Error Dection) - CR (Correct Rate) ::: ## RGB reference paper 1. [How Easily do Irrelevant Inputs Skew the Responses of Large Language Models](https://arxiv.org/pdf/2404.03302) - 概述 : 研究 LLM 在<span class="orange">面對與問題無關但語義相關的資訊時的穩健性</span>。研究發現,這些模型容易被高度語義相關的無關信息分心,導致錯誤的回答。 - 使用的 dataset : PopQA、ENTITYQUESTIONS - ![image](https://hackmd.io/_uploads/SkdptogY0.png) - Evaluation Metrics : - Misrepresentation Ratio : 給予 LLM 與 query 不相關的資訊,LLM 被誤導的比例。<span class="red">(與 Counterfactual Robustness 情境類似)</span> - Uncertainty Ratio : 給予 LLM 與 query 不相關的文件,LLM 所生成的回答中,表達出 "我不確定" 等等字眼的比率 <span class="red">(與Negative Rejection 的情境類似)</span> ![image](https://hackmd.io/_uploads/SJyL1SfKC.png) 2. [ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence](https://arxiv.org/pdf/2404.10198) - 概述 : 探討LLM在面對來自檢索到的外部資訊時,如何處理這些資訊,尤其是在資訊可能是錯誤的或有害的情況下。研究的主要目的是<span class="orange">評估LLM在遇到外部資訊與自身內部先驗知識發生衝突時,是否能夠正確辨識和處理這些情況</span> <span class="red">(與 CounterFactual Robustness 情境類似)</span> - 使用的 dataset : 利用 GPT-4o 產生具有 groundtruth dataset - Evaluation Metrics : ![image](https://hackmd.io/_uploads/Byp1eqkF0.png) 3. [Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation](https://arxiv.org/abs/2404.06809) - 概述 : 這篇論文提出 (Credibility-aware Generation, CAG) 的框架,為了減輕RAG在檢索中因檢索過程中引入的有缺陷信息而產生的影響。CAG 使模型能夠根據信息的可信度來分辨和處理資訊,從而提高生成結果的可靠性和正確性。 - 使用的 dataset : - HotpoQA、2WikiMHQA、<span class="red">**RGB**</span>、Musique、ASQA、RealTime QA、TAQA - EvolvingTemp QA & NewsPollutedQA(皆是利用 GPT3.5 產生的QA pair) - - ![image](https://hackmd.io/_uploads/SyQv-CBFC.png) - Evaluation Metric : EM - <span class="red">**有針對 Noise Robustness 做實驗**</span> ![image](https://hackmd.io/_uploads/H1R8NhlYR.png) 4. [SESAME - Self-supervised framework for Extractive queStion Answering over docuMent collEctions](https://assets-eu.researchsquare.com/files/rs-4018202/v1_covered_b672f9fe-9dc8-4d77-ac07-e8870b2e8d75.pdf?c=1710297298) - 概述 : SESAME 在無噪音和 60% 噪音設置下的精度和 F1 分數均優於基準模型,顯示了其在處理噪音數據時的穩定性和有效性。 - 使用 dataset : NewsQA(利用 GPT3.5 產生的QA pair)、<span class="red">**RGB (只針對 Noise Robustness)**</span> - Evaluation Metric : Precision、F1 - <span class="red">**有 Noise Robustness 實驗**</span> ![螢幕擷取畫面 (14)](https://hackmd.io/_uploads/SJqjGCxK0.png) 5. [DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation](https://arxiv.org/pdf/2406.05654) - 概述 : 這篇論文探討了<span class="red">以中文為基準的領域特定 RAG 系統的評估</span>,強調了大規模語言模型 (LLMs) 在處理專業知識領域問題時的局限性,並提出了六個關鍵能力來評估 RAG 模型的性能。 - 使用 dataset : 利用 GPT-4 產生 QA pair - Evaluation Metric : - EM、EMS、F1、Rouge-L、GE - <span class="red">**沒有使用 RGB dataset 但有自行生成 Noise QA dataset 做 Noise Robustness 實驗**</span> - ![image](https://hackmd.io/_uploads/ByseN0rFC.png) - ![image](https://hackmd.io/_uploads/rkefBAltA.png) - 圖片中的NC代表Noise Count 6. [Better RAG using Relevant Information Gain](https://arxiv.org/pdf/2407.12101) - <span class="red">**使用 RGB benchmark**</span> - ![image](https://hackmd.io/_uploads/B1GjWUIYR.png) - 概述 : 使用Dartboard方法改進RAG系統中的檢索過程。提高檢索多樣性,避免冗餘信息。展示了其在 RGB 上的優越性能。 - : NewsQA(利用 GPT3.5 產生的QA pair) - Simple(簡單問答):300個簡單問答測試QA(可以通過檢索到的一個段落來回答),總計11,641個段落。 - Intergrated(資訊整合問答):100個整合測試QA(需要檢索多個段落來回答),總計5,701個段落。 - Evaluation Metric : NDCG(越高表示檢索結果越符合查詢的相關性)、QA Accuracy (問答準確率) ![螢幕擷取畫面 (625)](https://hackmd.io/_uploads/HkHXvBGF0.png) 7. [RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots](https://arxiv.org/pdf/2403.01193) - 概述 : 這篇論文主要探討了 LLMs 在生成錯誤或虛假信息(被稱為“幻覺”)方面的問題,並分析了 RAG 技術來減少這些幻覺的潛力。 - <span class="red">**有類似 Noise Robustness 實驗**</span> ![image](https://hackmd.io/_uploads/rJDnkEXFC.png) 8. [Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation](https://arxiv.org/abs/2402.18150) - 概述 : 將 LLM 視為精煉訊息的角色,提出 INFO-RAG training method 訓練 LLM ,可以進一步改善RAG的資訊瓶頸,並對檢索到的文本具有 robustness - 3 種訓練情境 : 1. 檢索到的文件中包含回答問題所需正確知識,LLM負責提取出所需訊息以及篩掉不必要的訊息 2. 即使檢索到的文本中有不完整或錯誤的信息,LLMs仍然可以使用其內部知識來驗證、修正和補充這些信息 <span class="red">**(與 CounterFactual Robustness 、Noise Robustness 情境類似)**</span> 3. 檢索到的文本沒有任何可用於解決問題的答案,LLMs 仍能透過理解語義提供相關的資訊,可能間接幫助解答問題。 - Evaluation Metrics : accuracy (EM 計算)、ROUGE、F1 ![image](https://hackmd.io/_uploads/r1aXGIzFA.png) 9. [Evaluating the External and Parametric Knowledge Fusion of Large Language Models](https://arxiv.org/pdf/2405.19010) - <span class="red">**4 種使用場景(都有添加 Noise document)**</span>![image](https://hackmd.io/_uploads/ry8hLXPKC.png) - evaluation metric : accuracy (Racc) and information coverage (Rcover) ![image](https://hackmd.io/_uploads/SyoIDXwKA.png) 10. [Evaluation of Orca 2 Against Other LLMs for Retrieval Augmented Generation](https://link.springer.com/chapter/10.1007/978-981-97-2650-9_1) - 概述 :探討了Orca 2語言模型在檢索增強生成(RAG)任務中的表現,並與其他主要語言模型(如Llama-2、GPT-3.5-Turbo和GPT-4)進行了對比 - <span class="red">只有在 introduction 以及 related work 中提及 RGB,並沒有使用 RGB dataset 或是使用 RGB Benchmark</span> - 使用的 dataset :Mutag,IMDB-Binary (IMDB), DD, Proteins, and Graph-Twitter (Twitter) - Evaluation Metrics :Faithfulness、Answer Relevance、Overall Score、Inference Speed 7. [TorchOpera: A Compound AI System for LLM Safety](https://arxiv.org/pdf/2406.10847) - <span class="red">沒有使用 RGB dataset、沒有做類似 RGB 的實驗,只有在 introduction 以及 related work 中稍微提到 RGB</span> - 概述 : 這篇論文針對幻覺檢測和不安全用戶輸入檢測進行了研究。幻覺檢測使用HaluEval數據集進行模型微調,創建了結構化提示以有效識別LLM輸出中的幻覺。不安全用戶輸入檢測則針對檢測用戶輸入中的不良內容(如毒性、提示注入、刻板印象、騷擾、威脅、下流話、身份攻擊和暴力)進行了模型微調,並使用從15個公共數據源隨機選擇的數據來構建訓練數據集,以減少開源數據與實際用戶查詢分佈之間的差異。 - 使用 dataset : E-Commerce, ChatDoctor, PatientDoctorChat - Evaluation Metric :Accuracy, Recall, F1 - ![image](https://hackmd.io/_uploads/ByJF6TxYR.png) 8. [UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis](https://arxiv.org/pdf/2406.15187) - <span class="red">沒有使用 RGB dataset、沒有做類似 RGB 的實驗,只在 prior benchamrk 提及 RGB</span> - ![image](https://hackmd.io/_uploads/HJ2aRHIYR.png) - 概述 : UDA數據集包含2,965個實際文檔和29,590個專家標註的問答對,涵蓋金融、學術和世界知識三大領域,通過評估多種RAG技術並比較其與長上下文LLMs的性能,特別在處理長文檔和複雜查詢方面,使用最長公共子序列(LCS)的相對長度評估檢索策略的有效性,以識別包含事實證據的檢索塊。 - 使用 dataset : FinHybrid、TatHybriPaperTab 和 PaperText、FetaTab、NqText - Evaluation Metric :EM, F1 9. [Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices](https://arxiv.org/pdf/2406.03777) - 概述 : 透過使用LaMP資料集,對不同的模型壓縮技術(如量化、剪枝和知識蒸餾)進行了比較,探討了不同超參數設置對模型性能的影響,研究了用戶歷史數據量對檢索增強生成(RAG)方法的性能影響,並評估了這些技術在資源受限的邊緣設備上的運行效果,旨在優化大型語言模型在邊緣設備上的部署和性能 - 使用 dataset : LaMP - Evaluation Metric : Accuracy、ROUGE-1、Normalized Accuracy(為了解決無法獲得客觀的人類表現數據的問題) 12. [Multi-Head RAG: Solving Multi-Aspect Problems with LLMs](https://arxiv.org/pdf/2406.05085) - <span class="red">只有在 introduction 提及 RGB 篇論文</span> - ![image](https://hackmd.io/_uploads/rJaieLIFR.png) - 概述 : 這篇論文通過設計多頭檢索增強生成(MRAG)方法,並進行實驗比較MRAG與標準RAG和分割RAG在多方面查詢上的性能,結果顯示MRAG在檢索成功率和加權檢索成功率方面均顯著提升,特別是在處理需要多方面上下文的複雜查詢時 - 使用 dataset :![image](https://hackmd.io/_uploads/SJsXAzZK0.png) 13. [Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework](https://arxiv.org/pdf/2406.14783) - <span class="red">related work 中提及 RGB 而已</span> - ![image](https://hackmd.io/_uploads/S1g4BIUFC.png) - 概述 :提出了一種新的評估方法——LLM-as-a-judge,來克服傳統方法中缺乏大規模測試集和“黃金答案”的問題<span class="red">**沒有ground truth**</span> - 合成測試集生成:使用大型語言模型(LLMs)生成基於真實用戶查詢和企業內部文件的合成查詢。具體操作是隨機選取文件片段,將這些片段注入到LLM的提示中,生成用戶可能會提出的問題,從而構建一個大規模的測試集來評估系統性能。 - LLM-as-a-judge 方法:在缺乏“黃金答案”的情況下,使用LLM來評估由檢索增強生成(RAG)系統生成的答案質量。評估流程如下:使用LLM生成合成查詢。透過不同的RAG管道生成答案,使用另一個LLM作為裁判,對比兩種不同RAG管道生成的答案,選擇更好的答案。 - Evaluation Metric :Relevance、Accuracy、Completeness、Precision、MRR@5 14. [ACTIVERAG: Revealing the Treasures of Knowledge via Active Learning](https://arxiv.org/abs/2402.13547) - 提到rgb的部分 : 傳統的 rag (retrieval-generation) 模式可能會被噪音文件影響 - 概述 : 提出 ACTIVERAG framework,這個框架從被動的知識接收轉變為主動的學習機制,通過知識構建和認知連結機制,LLM 理解外部知識,並在問答數據集上提高了5%的性能表現。 - 使用的 dataset : NQ、TriviaQA、WebQA - Evaluation Metrics : accuracy 用 StringEM 計算 ## Dataset - [HotpotQA](https://arxiv.org/abs/1809.09600) :heavy_check_mark: - [2WikiMHQA](https://arxiv.org/abs/2011.01060) :heavy_check_mark: - [Musique](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00475/110996/MuSiQue-Multihop-Questions-via-Single-hop-Question) - [ASQA](https://arxiv.org/abs/2204.06092) - [PopQA](https://aclanthology.org/2023.acl-long.546/) :heavy_check_mark: - [ENTITYQUESTIONS](https://aclanthology.org/2021.emnlp-main.496/) - [NQ](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00276/43518/Natural-Questions-A-Benchmark-for-Question) :heavy_check_mark: - [TriviaQA](https://arxiv.org/abs/1705.03551) :heavy_check_mark: - [WebQA](https://aclanthology.org/D13-1160.pdf) :heavy_check_mark: - [RealTime QA](https://proceedings.neurips.cc/paper_files/paper/2023/file/9941624ef7f867a502732b5154d30cb7-Paper-Datasets_and_Benchmarks.pdf) :heavy_check_mark: - [TAQA](https://arxiv.org/pdf/2402.16797) - [ZS](https://arxiv.org/abs/1706.04115) :heavy_check_mark: - [ELI5](https://arxiv.org/abs/1907.09190) :heavy_check_mark: - [WoW](https://arxiv.org/abs/1811.01241) :heavy_check_mark:

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully