By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
# Header | Header | 基本排版 | |
- Unordered List |
1. Ordered List |
- [ ] Todo List |
> Blockquote | Blockquote |
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
AI / ML領域相關學習筆記入口頁面
Deeplearning.ai GenAI/LLM系列課程筆記
2022。AACL-IJCNLP。Recent Advances in Pre-trained Language Models: Why Do They Work and How to Use Them
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →個人補充 - 從數值分布觀點理解 提示工程(Prompt Engineering)與參數效率微調PEFT
Prompt Engineering(提示工程)
PEFT(Parameter-efficient Fine-tuning,參數高效微調)
將這兩種方法對比,可以這樣理解:Prompt Engineering是一種直觀且更接近人類思維的方法,而PEFT是一種更數學化、需要計算資源進行數值優化的方法。Prompt Engineering在概念上更簡單,但可能需要更多的創造性和人工參與。相反,PEFT雖然在計算上更複雜,但它提供了一種更細膩控制模型的方式,並且可以自動化進行
Part 4 How to use PLMs: Parameter-efficient fine-tuning
PLMs are gigantic
What is standard fine-tuning really doing?
Fine-tuning = modifying the hidden representation based on a PLM

4-1 Adapter
Use special submodules to modify hidden representations!

small trainable submodules inserted in transformers

During fine-tuning, only update the adpaters and the classifier head

All downstream tasks share the PLM; the adapters in each layer and the classifier heads are the task-specific modules
4-2 LoRA: Low-Rank Adaptation of Large Language Models
\[ \delta W = W_A W_B \quad W_A \in \mathbb{R}^{in \times r}, \, W_B \in \mathbb{R}^{r \times out} \]

動機: Downstream fine-tunings have low intrinsic dimension
權重更新: Weight after fine-tuning = \(W_o\) (pre-trained weight) + \(\Delta W\) (updates to the weight)
微調後的權重可以表示為\(W_o\)(原始的預訓練權重) + \(\Delta W\)(權重的更新)
假設: The updates to the weight \(\Delta W\) also gave a low intrinsic rank
權重的更新\(\Delta W\)也具有低的固有秩,意即儘管有大量的權重可以更新,但實際上只有一小部分的權重需要被調整。
微調後的權重: Fine-tuned weight = \(W_o + \Delta W = W_o + BA\)
, rank \(r \ll \min(d_{FFW}, d_{model})\)
微調後的權重 \(W\) 可以表示為原始的權重 \(W_o\) 加上某種低秩矩陣 \(B\)和\(A\)的乘積,其中\(r\)是這個低秩矩陣的秩,且\(r\) 遠小於 \(d_{FFW}\)和 \(d_{model}\)中的較小值。
All downstream tasks share the PLM; the LoRA in each layer and the classifier heads are the task-specific modules

4-3 Prefix tuning
4-3 Prefix tuning
在模型的每一層(Prompt tuning只有在第一層)中插入一個可訓練的前綴。這些前綴參數在微調過程中會被更新,而模型的主體部分則不會
Standard Self-Attention

Insert trainable prefix

Only the prefix (key and value) are updated during finetuning

4-4 (Soft) Prompt tuning / Soft Prompting
4-4 (Soft) Prompt tuning / Soft Prompting
(Hard) prompting: add words in the input sentence

Hard Prompts: words (that are originally in the vocabulary)

Soft Prompts: vectors (can be initialized from some word embeddings)
PEFT 通過專注於模型中的特定參數來減少任務特定的參數,從而減少了模型的大小和計算需求
於 PEFT 使用的參數較少,因此模型較不容易在訓練數據上過度擬合。這意味著模型在未見過的數據上的性能會更好,特別是在領域外的情境中
PEFT 方法需要微調的參數較少,這使得它們成為在小數據集上訓練時的理想選擇。這是因為在小數據集上訓練大型模型時,過多的參數可能會導致過度擬合
Model Customization
2023.08。Nvidia。Selecting Large Language Model Customization Techniques
Parameter-efficient fine-tuning
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Prompt Learning
2023。Nvidia。Prompt Learning - p-tuning and prompt tuning

Prompt Tuning
在對預訓練的GPT模型進行prompt tuning時,軟提示嵌入作為一個2D矩陣初始化,大小為total_virtual_tokens X hidden_size。每個任務都有其自己的2D嵌入矩陣與之關聯。任務在訓練或推理期間不共享任何參數。所有GPT模型參數被凍結,只有每個任務的嵌入參數在訓練期間更新(各任務間參數獨立)。
在prompt tuning中,您可以指定每個任務的嵌入如何初始化。您可以:
如果您選擇從現有的嵌入權重初始化虛擬token嵌入,您可以在模型的配置中提供您想用於初始化的單詞字符串。這個字符串將被分詞並鋪平或截斷,以匹配您希望使用的虛擬token總數(total_virtual_tokens)。詞彙嵌入被複製並用於為每個任務初始化軟提示嵌入矩陣。詞彙嵌入本身在prompt tuning過程中不更新或改變。
Using Both Prompt and P-Tuning
由於p-tuning在訓練期間在任務間共享參數,因此在多個相似任務上對模型進行p-tuning可能會使您的模型在任務間分享洞察力。同樣地,一次在許多非常不同的任務上進行p-tuning可能比prompt tuning表現更差,後者為每個任務調整一套獨立的參數。通常我們推薦使用p-tuning而不是prompt tuning。