Try   HackMD

Deeplearning.ai GenAI/LLM系列課程筆記

Large Language Models with Semantic Search。大型語言模型與語意搜索

Finetuning Large Language Models。微調大型語言模型


Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Large Language Models with Semantic Search
大型語言模型與語意搜索

ReRank

課程概要

  • 文件講解關鍵字搜尋、稠密檢索以及 ReRank 的工作原理
  • 關鍵字搜尋容易返回不夠相關的結果
  • 稠密檢索容易返回不正確的結果
  • ReRank 可以給查詢和回答一個相關性分數,從而排序結果
  • ReRank 可以用來改進關鍵字搜尋和稠密檢索,找到正確答案
  • 如何評估搜尋系統的效果

只靠語意相似性搜尋不足以得到正確答案 Dense Retrieval is also not perfect

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
Dense Retrieval is also not perfect

解決方案:重新排序(ReRank)搜尋結果 Solution: ReRank

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
Solution: ReRank

如何對 Rerank 進行訓練 How Rerank gets trained

ReRank is trained on lots of QA pairs
correct
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
wrong
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
  • ReRank Model是通過監督學習訓練好的預訓練模型
  • 給定大量正確的查詢-文件對(QA pairs),讓模型學習給出高分數
  • 同時給定大量不正確的查詢-文件對,讓模型學習給出低分數
  • 通過最大化正確對的分數和最小化不正確對的分數來訓練
  • 經過訓練後,ReRank 就能區分查詢和文件之間的關聯性
  • 從而按照相關性對檢索結果進行排序

大語言模型(LLM)的表現可以通過以下幾種主要方式進行優化:

  • 更大規模的預訓練:使用更多更多樣化的文本數據預訓練模型,可以提升模型的理解能力
  • 微調(Fine-tuning):使用特定領域的標註數據對模型進行微調,使其更好地適應下游任務
  • Prompt learning:只給模型一些示例輸入輸出,使其學會完成特定任務
  • Knowledge distillation:從大型教師模型中擷取知識精華到小型學生模型中
  • 模型架構優化:改進模型的架構和參數以提升表現
  • 強化學習:通過與環境的互動不斷改進模型策略
    文章提到 ReRank 是通過給定正確和不正確的查詢-文檔對來訓練,這符合 prompt learning 只需要少量示例就可以訓練模型的特性,屬於"prompt learning"的範疇

Improving Keyword Search with ReRank

  • 將關鍵字搜尋結果拿給 ReRank 模型排序,可以找到與查詢最相關的答案
  • 關鍵字搜尋會找到包含詞彙相似的文件,但不一定回答問題
  • ReRank 會給每個查詢-回答對一個相關性分數
  • 使用 ReRank 後可以從關鍵字搜尋結果中找到正確答案
程式碼實作
  • 環境設定

    ​​​​import os ​​​​from dotenv import load_dotenv, find_dotenv ​​​​_ = load_dotenv(find_dotenv()) # read local .env file ​​​​import cohere ​​​​co = cohere.Client(os.environ['COHERE_API_KEY']) ​​​​import weaviate ​​​​auth_config = weaviate.auth.AuthApiKey( ​​​​api_key=os.environ['WEAVIATE_API_KEY']) ​​​​client = weaviate.Client( ​​​​url=os.environ['WEAVIATE_API_URL'], ​​​​auth_client_secret=auth_config, ​​​​additional_headers={ ​​​​ "X-Cohere-Api-Key": os.environ['COHERE_API_KEY'], ​​​​})
  • 呼叫rerank model

    • 這邊使用cohere API提供的co.rerank,可以從裡面指定預訓練好的RERANKER模型
      • model = 'rerank-english-v2.0'
    ​​​​def rerank_responses(query, responses, num_responses=10): ​​​​ reranked_responses = co.rerank( ​​​​ model = 'rerank-english-v2.0', ​​​​ query = query, ​​​​ documents = responses, ​​​​ top_n = num_responses, ​​​​ ) ​​​​ return reranked_responses
    • 檢視rerank後的結果

      • keyword_search範例
      ​​​​​​​​query_1 = "What is the capital of Canada?" ​​​​​​​​results = keyword_search(query_1, ​​​​​​​​ client, ​​​​​​​​ properties=["text""title""url""views""lang""_additional {distance}"], ​​​​​​​​ num_results=3) ​​​​​​​​texts = [result.get('text') for result in results] ​​​​​​​​reranked_text = rerank_responses(query_1, texts) ​​​​​​​​import pandas as pd ​​​​​​​​df_reranked = pd.DataFrame(reranked_text) ​​​​​​​​df_reranked['relevance_score'] = df_reranked.apply(lambda row: row.values[0].relevance_score, axis=1) ​​​​​​​​df_reranked['index'] = df_reranked.apply(lambda row: row.values[0].index, axis=1) ​​​​​​​​df_reranked['document'] = df_reranked.apply(lambda row: row.values[0].document['text'], axis=1) ​​​​​​​​df_reranked.iloc[:,1:]
      • 可以看到查詢結果依據相關性分數排序了
        Image Not Showing Possible Reasons
        • The image was uploaded to a note which you don't have access to
        • The note which the image was originally uploaded to has been deleted
        Learn More →

Improving Dense Retrieval with ReRank

  • 稠密檢索結果拿給 ReRank 模型排序,可以找到最相關的答案
  • 稠密檢索會找到與查詢在詞嵌入空間最接近的回答,但不一定正確
  • ReRank 從稠密檢索結果中找到正確答案
  • ReRank 給每個查詢-回答對一個相關性分數,最高分的通常是正確答案
程式碼實作
  • dense_retrieval範例
    ​​​​from utils import dense_retrieval ​​​​query_2 = "Who is the tallest person in history?" ​​​​results = dense_retrieval(query_2,client) ​​​​texts = [result.get('text') for result in results] ​​​​reranked_text = rerank_responses(query_2, texts) ​​​​df_reranked = pd.DataFrame(reranked_text) ​​​​# (略)同樣使用pandas整理...
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

Evaluating Search/Recommendation Systems

評估搜索/推薦系統效果的常見指標

指標 優點 缺點 適用情境
Mean Average Precision
(MAP)
- 考慮整個召回率範圍的精度
- 計算所有查詢的平均表現
- 受個別查詢影響較大
- 對於部分相關的檢索不敏感
- 需要知道所有檢索結果的相關程度
- 整體系統性能評估
Mean Reciprocal Rank
(MRR)
- 簡單直接
- 評估速度
- 只考慮第一個相關結果
- 對後續結果不敏感
- 關注速度
- 只需要知道部分相關結果
Normalized Discounted
Cumulative Gain
(NDCG)
- 考慮結果相關性與排名
- 不同位置應用折扣
- 需要設定折扣係數
- 較為抽象
- 需要結果的相關性評分
- 評估排序質量

RAG檢索評估指標筆記-RAG Triad of metrics)

  • Mean Average Precision (MAP) 平均平均精度(MAP)
    在資訊檢索、機器學習和電腦視覺等領域,當我們要評估系統或模型的性能時,尤其是在排序問題或多標籤分類問題中,MAP是一個常用的指標。在機器學習的多標籤分類問題中,常用Precision-Recall Curve,來計算MAP。

    • Precision
      • 預測為真且實際為真 / 預測為真
      • 當給定一個排序的結果列表時(x軸可以是Recall、也可以是查詢結果),Precision是在某個特定位置之前的正確結果的比例。
    • Recall
      • 預測為真且實際為真 / 實際為真
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
    Calculation of Precision, Recall and Accuracy in the confusion matrix
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
    Visualization of the AP metric, estimated by the area below the Precision-Recall curve at different confidence intervals.

    在資訊檢索/推薦系統的MAP計算中,x軸則以"查詢結果/推薦結果"替代Recall

    計算流程:

    • 對返回的結果清單,計算每個召回率(recall)位置下的精度(Precision)
    • 將精度按召回率從高到低排列,形成精度-召回率(Precision-Recall)曲線
    • 計算曲線下的面積,即得到該查詢的平均精度(Average Precision, AP)
      在搜尋/推薦系統中的平均精度的公式可以表示為:

    AP=k=1nP(k)rel(k)

    其中:

    Pk 是前
    k
    個結果的精度,
    relk
    表示第
    k
    個結果的相關性(
    1
    表示相關,
    0
    表示不相ㄍ),
    n
    是返回的總結果數

    将所有查詢(在物件偵測案例,則可替換為所有類別)的 AP 求平均,即得到整個系统的 MAP:

    MAP=1Qi=1QAP(i)

    其中

    Q 是查詢總數,
    APi
    是第
    i
    個查詢的 AP

  • Mean Reciprocal Rank (MRR) 平均倒數排名(MRR)

    • 計算所有查詢的倒數排名的平均值。
    • 對於一個查詢,倒數排名(Reciprocal Rank)的計算方法是:

    若查詢的第一個相關結果出現在第k名,則該查詢的倒數排名為 1/k
    如果沒有相關結果,則倒數排名為 0,即:

    RR={1rankif a relevant document is retrieved 0otherwise

    那麼平均倒數排名為所有查詢的倒數排名的平均值:

    MRR=1|Q|i=1|Q|RR(i)

    其中,

    |Q|是查詢總數,
    RRi
    是第i個查詢的倒數排名。

    MRR的值在0到1之間,值越大表示系統能夠更快地返回相關文檔

    • 計算範例
      假設有三個查詢,它們的第一個正確答案分別出現在第一位、第三位和第二位:

      1. 查詢 1 的第一個正確答案排名為 1,倒數為
        11=1
      2. 查詢 2 的第一個正確答案排名為 3,倒數為
        130.33
      3. 查詢 3 的第一個正確答案排名為 2,倒數為
        12=0.5

      因此,MRR 的計算為:

      MRR=13(1+13+12)=13(1+0.33+0.5)0.61

      這個結果意味著系統在處理這三個查詢時,平均而言,第一個正確答案的排名倒數大約是 0.61。MRR 的值越接近 1,表明系統能夠更快地返回正確答案,性能越好。

  • Normalized Discounted Cumulative Gain (NDCG) 正規化折扣累積增益
    NDCG考慮了結果的相關性評分以及對排序給與權重(扣分),如果高相關性的結果被排序到後面則會進行扣分(Discounted)。推薦搭配此篇閱讀(2021.012。衛星。知乎。NDCG排序評估指標)

    對於一個查詢,NDCG的計算方法是:

    • Cumulative Gain:每個結果都有一個相關性評分
      rel(i)
      ,按從高到低排列
      • p
        表示考慮結果列表的前 p 個結果
      • rel(i)
        表示第 i 個結果的相關度評分
        Gain=rel(i)
      • 對於位置 i,累積增益Cumulative Gain
        CG

        CG=rel(1)+...+rel(i)=i=1prel(i)
      • CG只是單純累加相關性,不考慮回傳結果的排序,因此後面引入折扣(Discounted)因子
    • Discounted Cumulative Gain:考慮排序順序的因素,使得排名靠前的item增益更高,對排名靠後的item進行折扣(Discounted)

      • DCG@p
        表示考慮前 p 個結果的折扣累積增益求和
        DCG@p=i=1prel(i)log2(i+1)
        • 如果相關性分數rel(i只有(0,1)兩種值時,
          DCG@p
          還有另一種表示版本
          • 對每個
            rel(i)
            先通過
            2reli1
            進行一個轉換,這是為了擴大相關度評分的區分度
            DCG@p=i=1p2rel(i)1log2(i+1)
    • NDCG(Normalized DCG):標準化累積增益,使最大值等於 1,得到 NDCG
      即:

      • IDCG 指最理想排序下的折扣累積增益(Ideal Discounted Cumulative Gain)。 它代表了在當前結果集合中,如果按照結果的相關性評分從高到低完美排序,可以獲得的最高 DCG 值。
        IDCG 的計算方法是:
        • 將當前結果集根據相關性評分 rel 從高到低排序
        • 計算這個完美排序情況下的 DCG,作為 IDCG
      • NDCG的值在 0 到 1 之間,值越大表示結果排序越能優先返回相關文檔

    IDCG@n=maxiperm(n)DCG@n

    NDCG@p=DCG@pIDCG@p

    • 模擬排序0-100時,Discounted因子

      log2(i+1)的數值變化

      • 可以看到大約在排序前15時,曲線變化較陡峭,可以幫助拉開Gain值的差距
      ​​​​​​​​# Generate discounted values for positions from 1 to 100 ​​​​​​​​positions = np.arange(1, 101) ​​​​​​​​discounted_values = np.log2(positions + 1)

Reference

[2023.03。Sumit Kumar。Zero and Few Shot Text Retrieval and Ranking Using Large Language Models]

  • InPars
  • Unsupervised Passage Re-ranker (UPR)

[2023.05。Jerry Liu。LlamaIndex。Using LLM’s for Retrieval and Reranking]


Two-stage retrieval pipeline: 1) Top-k embedding retrieval, then 2) LLM-based reranking

2022。Prakhar Mishra。paperspace.com。Prompt-based Learning Paradigm in NLP

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Prompt-based learning the idea is to design input to fit the model

2022.04。sergio。【NLP】Prompt Learning 超强入门教程


2023.03。Tullie Murrell。shaped.ai。Evaluating recommendation systems (mAP, MMR, NDCG)

  • Mean Average Precision (mAP)
  • Mean Reciprocal Rank (MRR)
  • Normalized Discounted Cumulative Gain (NDCG)

2021.012。衛星。知乎。NDCG排序評估指標