Finetuning Large Language Models

考量與開始(Considerations on getting started now)

課程概要

開始前的實用步驟包括確定任務、收集相關數據、並首先微調小型模型(400M -1B)
較困難的任務如寫作任務可能需要更大的模型來處理
根據硬體需求選擇合適的模型大小，例如使用"1 V100" GPU可運行70億參數模型
PEFT或參數高效微調方法可以更有效地使用參數和訓練模型
LoRa（低秩適應）可以大幅減少訓練的參數，並在推理時與主要的預訓練權重合併

實用的微調方法 Practical approach to finetuning

釐清任務目標
- 在開始微調之前，首先要明確你希望模型完成的任務是什麼
收集與任務輸入/輸出相關的數據
- 為了訓練模型，你需要收集與你的任務相關的數據，例如問答、分類等
如果數據不足，則生成數據
- 如果沒有足夠的數據，可以使用提示模板來創建更多的數據
微調一個小型模型
- 建議首先微調一個小型模型，如400M到1B參數的模型，以獲得模型性能的初步感覺
改變提供給模型的數據量
- 為了了解數據量如何影響模型的方向，你應該嘗試改變提供給模型的數據量
評估
- 完成微調後，你應該評估模型的性能，看看哪些方面做得好，哪些需要改進
收集更多的數據
- 根據評估結果，你可能需要收集更多的數據來提高模型的性能
增加任務的複雜性
- 當你的模型在簡單任務上表現良好時，你可以嘗試增加任務的複雜性，例如組合多個任
增加模型大小以提高性能
- 對於更複雜的任務，你可能需要使用更大的模型以獲得更好的性能

微調任務與模型大小的關係(Tasks to finetune vs. model size)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

任務複雜性:
當模型需要產生更多的代碼/字詞tokens（例如長篇文章或程式碼）時，任務通常會更加困難
- 提取任務(Extract):
  - “閱讀”較為容易：這類任務通常要求模型從提供的數據中提取特定的信息
    - 關鍵詞(Keywords,)：識別文本中的主要詞彙
    - 主題(topics)：確定文本的主要主題或主題
    - 路由(routing)：根據文本內容將查詢導向到正確的資源或代理
    - 代理(agents)：模型可能需要理解或識別特定的代理或實體
- 擴展任務(Expand):
  - “寫作”更困難：這類任務要求模型產生新的內容，通常比單純的提取任務更為困難
    - 聊天：與用戶進行互動，回答問題或進行對話
    - 寫郵件：根據特定的指示或查詢生成電子郵件內容
    - 寫代碼：生成特定功能的程式碼

任務組合的複雜性:
- 組合多個任務比單一任務更困難
  - 當要求模型同時完成多個任務，例如同時進行分類和生成，其複雜性通常會增加
  - 這可能意味著模型需要在一個步驟中進行多個操作，而不是分開完成
模型大小與任務難度的關係:
- 任務越困難或越通用，需要的模型越大
  - 對於更困難或更廣泛的任務，如寫作或生成代碼，通常需要更大的LLM來獲得更好的性能
代理agents的靈活性:
- 你可能希望代理具有靈活性，一次完成多個任務或在一個步驟中完成
  - 例如，一個LLM可能需要在單一查詢中提供答案、生成相關的代碼和提供參考資料
- 這種靈活性使得模型可以更有效地回應用戶的需求，但也增加了其複雜性。

模型大小 x 計算能力(Model Sizes x Compute)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

AWS實例GPU(AWS Instance)
- AWS提供了多種GPU實例，例如"1 V100"，用戶可以根據需要選擇
- 這些GPU不僅可以在AWS上使用，還可以在其他雲平台上使用
GPU記憶體(GPU Memory)
- GPU的記憶體大小決定了它可以運行的模型的大小
- 例如，具有16GB記憶體的GPU可以運行7B參數模型進行推理，但訓練時可能只能適應1B參數模型
最大推理大小(Max inference size)
- 這是指模型在進行推理（即預測）時可以處理的最大參數數量
最大訓練大小(Max training size )
- 與推理不同，訓練模型需要更多的記憶體來存儲梯度和優化

參數高效微調(PEFT: Parameter-Efficient Finetuning)

PEFT的方法旨在解決微調大型語言模型(LLM)的不可行性和不切實際性，只通過訓練一小部分參數來達到此目的

參數高效微調簡分为三類：
- Addition-based(A) 加額外参数
  - Adapter-like
  - Soft prompts
- Selection-based(S) 選取一部分參數更新
- Reparametrization-based( R) 重參數化

Scaling Down to Scale Up: A Guide to Porameter-Efficient Fine-Tuning

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

LoRa 在裡面被歸類為Reparametrization-based®

LoRa (Low-Rank Adaptation of Large Language Models)

較詳細的介紹請看補充資料

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

LoRa (低秩適應)
- 功能：LoRa可以大幅減少需要訓練的參數和權重的數量
- 效果：例如，在GPT-3中，他們發現可以將其減少10,000倍，這導致GPU所需的記憶體減少3倍
- 精確度(Accuracy)：相較微調略有下降
- 延遲時間(Latency)：最終得到的推理延遲是相同的
LoRa的工作原理
- 權重訓練：在模型的某些層中訓練新的權重，並凍結主要的預訓練權重
- 權重顏色：在這裡，主要的預訓練權重以藍色表示，而新的權重（LoRa權重）以橙色表示
- 數學原理：新的權重是原始權重變化的秩分解矩陣。但更重要的是，你可以單獨訓練這些權重與預訓練的權重交替，然後在推理時將它們合併回主要的預訓練權重，更高效地得到微調模型
LoRa的應用
- 可以在一個客戶的數據上使用LoRa訓練一個模型，然後在另一個客戶的數據上訓練另一個模型，並在需要時在推理時將它們合併(merge)
  - 講稿原文
    
    means you could train a model with LoRa on one customer's data and then train another one on another customer's data and then be able to merge them each in at inference time when you need them.
  - 在不同資料集訓練後再合併? 原講稿確實用merge，這樣不會令各自的效果打折扣? 這部分有待確認
    - 原論文最多只有提到在部屬時因為參數共享可以快速切換
      
      it allows for quick task-switching when deployed as a service by sharing the vast majority of the model parameters

補充資料

PEFT: Parameter-Efficient Finetuning

Scaling Down to Scale Up: A Guide to Porameter-Efficient Fine-Tuning

A Guide to Parameter-Efficient Fine-Tuning

簡體中文說明

【機器學習 2023】(生成式 AI)。Hung-yi Lee。Finetuning vs. Prompting

2022。Cheng-Han Chiang,Yung-Sung Chuang, Hung-yi Lee。AACL-IJCNLP。Recent Advances in Pre-trained Language Models:Why Do They Work and How to Use Them

李弘毅老師實驗室出品，非常棒的語言模型近期(2022年)進展的介紹

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

。以下摘錄PEFT的四種重要方法(2022年)，強力推薦

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
詳見2022。AACL-IJCNLP。Recent Advances in Pre-trained Language Models:Why Do They Work and How to Use Them

【MLNLP学术Talk】第1期 - 卡内基梅隆大学何俊贤：从统一的视角看参数高效的迁移学习Towards a Unified View of Parameter-Efficient Transfer Learning

Is the Story Different when Using 0.1% params?
- Multi-head methods — prefix tuning and multi-head adapter — outperform others by at least 1.6 BLEU points
- Comparabie performance to full fine-tuning while tuning 6.7% relative size of parameters

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

低秩矩陣分解( Low-rank Matrix Decomposition)與 LoRA

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Low-rank Matrix Decomposition

LoRA

paper : 2021.06。LoRA: Low-Rank Adaptation of Large Language Models

github/LoRA

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Figure 1: Our reparametrization. We only train A and B.

2023.04。lightning.ai。Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Regular Fine-tuning	LoRA

2023.01。Hugging Face。Using LoRA for Efficient Stable Diffusion Fine-Tuning

在Stable Diffusion模型的應用

AI / ML領域相關學習筆記入口頁面

Deeplearning.ai GenAI/LLM系列課程筆記

Large Language Models with Semantic Search。大型語言模型與語義搜索

LangChain for LLM Application Development

Finetuning Large Language Models。微調大型語言模型

Finetuning Large Language Models

考量與開始(Considerations on getting started now)

課程概要

實用的微調方法 Practical approach to finetuning

微調任務與模型大小的關係(Tasks to finetune vs. model size)

模型大小 x 計算能力(Model Sizes x Compute)

參數高效微調(PEFT: Parameter-Efficient Finetuning)

LoRa (Low-Rank Adaptation of Large Language Models)

補充資料

PEFT: Parameter-Efficient Finetuning

Scaling Down to Scale Up: A Guide to Porameter-Efficient Fine-Tuning

A Guide to Parameter-Efficient Fine-Tuning

【機器學習 2023】(生成式 AI)。Hung-yi Lee。Finetuning vs. Prompting

2022。Cheng-Han Chiang,Yung-Sung Chuang, Hung-yi Lee。AACL-IJCNLP。Recent Advances in Pre-trained Language Models:Why Do They Work and How to Use Them

【MLNLP学术Talk】第1期 - 卡内基梅隆大学何俊贤：从统一的视角看参数高效的迁移学习Towards a Unified View of Parameter-Efficient Transfer Learning

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

低秩矩陣分解( Low-rank Matrix Decomposition)與 LoRA

LoRA

paper : 2021.06。LoRA: Low-Rank Adaptation of Large Language Models

2023.04。lightning.ai。Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)

2023.01。Hugging Face。Using LoRA for Efficient Stable Diffusion Fine-Tuning

2023.03。xiao sean。微調大型語言模型LLM的技術LoRA及生成式AI-Stable diffusion LoRA

Finetuning Large Language Models

課程概要

實用的微調方法 Practical approach to finetuning

微調任務與模型大小的關係(Tasks to finetune vs. model size)

模型大小 x 計算能力(Model Sizes x Compute)

參數高效微調(PEFT: Parameter-Efficient Finetuning)

LoRa (Low-Rank Adaptation of Large Language Models)

補充資料

PEFT: Parameter-Efficient Finetuning

Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → 低秩矩陣分解( Low-rank Matrix Decomposition)與 LoRA

LoRA

paper : 2021.06。LoRA: Low-Rank Adaptation of Large Language Models

Read more

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Baseline Email Assistant

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Introduction to Agent Memory

[AI Agents in LangGraph](https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/1/introduction)

AI / ML領域相關學習筆記入口頁面

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

低秩矩陣分解( Low-rank Matrix Decomposition)與 LoRA