[2025李宏毅ML] 第10講:人工智慧的微創手術 — 淺談 Model Editing
目錄
Model Editing: 人工智慧的微創手術
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Model Editing
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
需求:想教模型最新的知識、虛假的知識、…
Model Editing:植入一項知識
不同於
Post training:學習較大的技能(新語言、工具、推理等)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
▪ 把 Model Editing 視為 Post training? 不太好
因 Model Editing 通常僅 "一筆" 訓練資料,他可能會學成回答同一個答案
Model Editing 的評量方法
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
▪ Reliability:想要修改的輸入及答案皆有達成改變
▪ Generalization:輸入有些改變時,答案也要能達成改變
▪ Locality:其他無關的輸入,其答案不該變動
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
▪ "Generalization" 在不同論文裡的寬限度不同
- paraphrase 跟目標輸入是 "相同意思的輸入",是否可達成目標改變
- reverse 目標輸入 跟 目標答案 "反過來問",是否可達成
- portability 目標的 "其他特性" 應該也要連得起來(最困難)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Model Editing 常見方法
1. 不動參數
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
▪ In-context Knowledge Editing (IKE)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
模型不一定會相信你給他的新知識
–> 需要給一些範例,告訴模型如何使用新資訊
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
ex: 提供 Reliability/Generalization/Locality 三方向的範例
2-1. 改變參數:人類決定如何編輯
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
由人類對模型的理解 找出要更新的地方跟方法
▪ Rank-One Model Editing (ROME)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
step1: 找出最相關的編輯位置
step2: 修改參數
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
▪ 案例:太空針塔在西雅圖 想讓他改成是台北
目標:找出模型中跟西雅圖這個答案最有關的位置,改參數,讓他變成回答台北
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 怎麼找出欲修改的位置:
先把輸入"太空針塔"跟 embedding 都遮起來,再把 原embedding 一個個放進去試,看哪個會使他輸出答案西雅圖,此即為存放此資訊的重要的位置
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
猜想:在中間位置存了兩者關係的資訊 在最後attention把中間資訊帶過來輸出
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
想編輯feedforward network(W)內的資訊
即會改變加入 residual stream 的輸入
即會改變此 layer 的輸出
即會改變 答案
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 怎麼找出v*
使得最終輸出為 Taipei:
論文是用 gradient decent 去找出v*
這個參數 (較舊的方法)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
k*: 是前幾個字的輸入經過layer
W^*: 是更改後的參數
此論文還需要訂出什麼是不想被改到的
有 close form solution!
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
寫成數學式的樣子
2-2. 改變參數:人工智慧學習如何編輯
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
由人類決定要如何進行編輯 –> 讓另外一個人工智慧學習如何編輯
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
給他 輸入/輸出/thita 希望他輸出更新檔 e 來給模型更新
這種編輯模型(可輸出別人模型的參數) 又叫 "Hypernetwork"
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
就是一種 Meta Learning
Meta Learning 完整介紹請見《機器學習2019》
https://www.youtube.com/playlist?list=PLJV_el3uVTsOK_ZK5L0Iv_EQoL1JefRL4
https://youtu.be/QNfymMRUg3M?si=GQP2H_pGyqLR6cWI
如何訓練 Hypernetwork?
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
希望給他輸入輸出後,他可以輸出 ei
but 我們未知 ei 阿!!
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
所以要把兩個模型 "合起來" 看成一個類神經網路
將 ei 看作中間某層的輸出
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Training:準備多組的資料
(含不想被改動的,這樣當資料夠多時,模型就不會改到其他無關資訊)
Testing:不用準備不想被改動的資料
–>此做法實際上有點困難…
▪ 實際上的做法
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
-我們平常訓練nn模型的方式是:
訓練資料 算出 loss 及 gradient descent g
再把 g 加到 thita 上 去更新模型
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
照著 將g 輸入一個nn得到更新檔e
–> 沒有強制給nn一些限制的話 還是很難訓練起來… 因為參數太多了
(1024^4, 比目前 DeepSeek 系列最大的模型參數更多)
▪ MEND
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
https://arxiv.org/abs/2110.11309
gradient descent 可以拆成 u 乘上 vt
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
推導詳見 [十年前影片]
【舊影片上傳】DNN Backpropagation (2015 年上課錄影)
總結
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
介紹了 Model Editing 常見與經典的方法
- 不動參數
- 改變參數:人類決定如何編輯、人工智慧學習如何編輯
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
更多方法: EnowDdit
https://zjunlp.github.io/project/KnowEdit/
– END –