# LoRA: LOw-Rank Adaptation of large language models ### Low-Rank-Parametrized Update Matrices 根據[Aghajanyan(2020)](https://arxiv.org/pdf/2012.13255.pdf)的研究: > pre-trained 語言模型在微調至下游任務時擁有較低的**本徵維度(instrisic dimension)** 簡單來說就是可以用**更少**的dimension達到相同的效果,並且在此條件下仍然可以經由隨機映射至較小的子空間中有效的訓練。 基於此觀點,作者假設更新的的權重也有較低的**本徵秩(instrisic rank)**。 假定pre-trained 權重為$W_0\in\mathbb{R}^{d\times k}$,我們將更新權重拆分為$A,B$矩陣:$W_0+\Delta W=W_0+B\cdot A$,且$B\in\mathbb{R}^{d\times r},A\in\mathbb{R}^{r\times k}, r<<\min{(d,k)}$。(降低權重大小) 訓練時$W_0$不做梯度更新,並且輸入是同時經過$W_0$及$\Delta W=BA$並對輸出向量直接相加,也就是: $\begin{equation} h=W_0x+\Delta Wx=W_0x+BAx;\quad \end{equation}$ 且$B,A$遵從高斯分布
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up