or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
AI / ML領域相關學習筆記入口頁面
Deeplearning.ai GenAI/LLM系列課程筆記
2022。AACL-IJCNLP。Recent Advances in Pre-trained Language Models: Why Do They Work and How to Use Them
李弘毅老師實驗室出品,非常棒的語言模型近期(2022年)進展的介紹,出自chatGPT問世前的世代
投影片大綱:
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →個人補充 - 從數值分布觀點理解 提示工程(Prompt Engineering)與參數效率微調PEFT
Prompt Engineering(提示工程)
PEFT(Parameter-efficient Fine-tuning,參數高效微調)
將這兩種方法對比,可以這樣理解:Prompt Engineering是一種直觀且更接近人類思維的方法,而PEFT是一種更數學化、需要計算資源進行數值優化的方法。Prompt Engineering在概念上更簡單,但可能需要更多的創造性和人工參與。相反,PEFT雖然在計算上更複雜,但它提供了一種更細膩控制模型的方式,並且可以自動化進行
Part 4 How to use PLMs: Parameter-efficient fine-tuning
以下摘錄PEFT的四種重要方法(2022年)
PLMs are gigantic
是在BERT時代的預訓練模型為了解決下游任務的模型參數量而提出的解決辦法
What is standard fine-tuning really doing?
Fine-tuning = modifying the hidden representation based on a PLM

4-1 Adapter
Use special submodules to modify hidden representations!

small trainable submodules inserted in transformers

在transformers模塊後面插入小型的可訓練運算模算
During fine-tuning, only update the adpaters and the classifier head

梯度更新時凍結預訓練模型的部分、只更新adpaters跟橋接下游任務的分類頭
All downstream tasks share the PLM; the adapters in each layer and the classifier heads are the task-specific modules
所有下游任務共享預訓練的語言模型(PLM,即現代所說的LLM)、針對下游特定任務則有專屬的adapters跟分類頭
4-2 LoRA: Low-Rank Adaptation of Large Language Models
\[ \delta W = W_A W_B \quad W_A \in \mathbb{R}^{in \times r}, \, W_B \in \mathbb{R}^{r \times out} \]
在Transformer層的上採樣與下採樣計算模塊,分別搭配一組可訓練的參數
LoRA設計概念

動機: Downstream fine-tunings have low intrinsic dimension
權重更新: Weight after fine-tuning = \(W_o\) (pre-trained weight) + \(\Delta W\) (updates to the weight)
微調後的權重可以表示為\(W_o\)(原始的預訓練權重) + \(\Delta W\)(權重的更新)
假設: The updates to the weight \(\Delta W\) also gave a low intrinsic rank
權重的更新\(\Delta W\)也具有低的固有秩,意即儘管有大量的權重可以更新,但實際上只有一小部分的權重需要被調整。
微調後的權重: Fine-tuned weight = \(W_o + \Delta W = W_o + BA\)
, rank \(r \ll \min(d_{FFW}, d_{model})\)
微調後的權重 \(W\) 可以表示為原始的權重 \(W_o\) 加上某種低秩矩陣 \(B\)和\(A\)的乘積,其中\(r\)是這個低秩矩陣的秩,且\(r\) 遠小於 \(d_{FFW}\)和 \(d_{model}\)中的較小值。
All downstream tasks share the PLM; the LoRA in each layer and the classifier heads are the task-specific modules

4-3 Prefix tuning
4-3 Prefix tuning
前綴微調是一種特殊的微調技術,其中只有模型的一部分參數(即前綴)是可訓練的,而模型的其餘部分保持固定
在模型的每一層(Prompt tuning只有在第一層)中插入一個可訓練的前綴。這些前綴參數在微調過程中會被更新,而模型的主體部分則不會
Standard Self-Attention

Insert trainable prefix

Only the prefix (key and value) are updated during finetuning

個人想法
4-4 (Soft) Prompt tuning / Soft Prompting
4-4 (Soft) Prompt tuning / Soft Prompting
(Hard) prompting: add words in the input sentence

Hard Prompts: words (that are originally in the vocabulary)

直接在輸入句子中添加固定的詞彙。這些詞彙是原始詞彙表中的詞彙
Soft Prompts: vectors (can be initialized from some word embeddings)
結論
PEFT 通過專注於模型中的特定參數來減少任務特定的參數,從而減少了模型的大小和計算需求
於 PEFT 使用的參數較少,因此模型較不容易在訓練數據上過度擬合。這意味著模型在未見過的數據上的性能會更好,特別是在領域外的情境中
PEFT 方法需要微調的參數較少,這使得它們成為在小數據集上訓練時的理想選擇。這是因為在小數據集上訓練大型模型時,過多的參數可能會導致過度擬合
補充資料
Model Customization
2023.08。Nvidia。Selecting Large Language Model Customization Techniques
Parameter-efficient fine-tuning
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Prompt Learning
2023。Nvidia。Prompt Learning - p-tuning and prompt tuning
提供理論說明、實做程式碼

Prompt Tuning
在對預訓練的GPT模型進行prompt tuning時,軟提示嵌入作為一個2D矩陣初始化,大小為total_virtual_tokens X hidden_size。每個任務都有其自己的2D嵌入矩陣與之關聯。任務在訓練或推理期間不共享任何參數。所有GPT模型參數被凍結,只有每個任務的嵌入參數在訓練期間更新(各任務間參數獨立)。
在prompt tuning中,您可以指定每個任務的嵌入如何初始化。您可以:
如果您選擇從現有的嵌入權重初始化虛擬token嵌入,您可以在模型的配置中提供您想用於初始化的單詞字符串。這個字符串將被分詞並鋪平或截斷,以匹配您希望使用的虛擬token總數(total_virtual_tokens)。詞彙嵌入被複製並用於為每個任務初始化軟提示嵌入矩陣。詞彙嵌入本身在prompt tuning過程中不更新或改變。
P-Tuning
在p-tuning中,使用一個LSTM模型來預測虛擬token嵌入。我們將這個LSTM模型稱為prompt_encoder。LSTM參數在p-tuning開始時隨機初始化。所有GPT模型參數被凍結,只有LSTM權重在每個訓練步驟中更新。LSTM參數所有任務間共享,但LSTM模型為每個任務輸出唯一的虛擬token嵌入。LSTM預測的虛擬token以與prompt-tuning完全相同的方式插入到離散token輸入中。您仍然通過設置total_virtual_tokens來指定您想使用的虛擬token數量,每個虛擬token嵌入仍然是大小為hidden_size的1D向量。
Using Both Prompt and P-Tuning
由於p-tuning在訓練期間在任務間共享參數,因此在多個相似任務上對模型進行p-tuning可能會使您的模型在任務間分享洞察力。同樣地,一次在許多非常不同的任務上進行p-tuning可能比prompt tuning表現更差,後者為每個任務調整一套獨立的參數。通常我們推薦使用p-tuning而不是prompt tuning。