tags: `ML數學分部`

Backpropagation 反向傳播算法

簡介

反向傳播算法 (BPP) 是拿來計算梯度、更新神經網路參數的方法
在之前我們知道該怎麼更新 linear regression 的參數
也就是計算 loss 的梯度並更新權重

w^{1} \leftarrow w^{0} - η * \frac{\partial L}{\partial w}

那神經網路我們可以看成是由多個 linear regression 所組成 ( 加上 Activation funciton )
( 詳情可見:從 Linear Regression 到神經網路 )

也就是有很多的權重 w 要計算，
而 Backpropagation 就是一種有效率來計算每個權重梯度的方法

目標

我們最終的目標是算出 Update 參數時的梯度

▽ L (θ)

這邊將

L (θ)

(loss) 定義為

L (θ) = \sum_{n = 1}^{N} l^{n} (θ)

( 定義

l

的意思是個別資料的

l o s s

)

那梯度可以寫成下列式

▽ L (θ) = \frac{\partial L}{\partial w} = \sum_{n = 1}^{N} \frac{\partial l^{n}}{\partial w}

於是我們的目標就是要計算出

目 標 = \frac{\partial l}{\partial w}

並將它加總

Forward Pass & Backward Pass

這邊使用老師舉的範例來說明
( 會用到微積分的 Chain Rule )

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

如果我們神經網路長上述這樣 ( 先不考慮 Activation function )
那要我的計算目標

\frac{\partial l}{\partial w}

利用 Chain Rule 可以化簡為以下式子

\frac{\partial l}{\partial w} = \frac{\partial z}{\partial w} \frac{\partial l}{\partial z}

在這個階段我們會將任務變為兩個
也就是 BPP 的兩步驟

Forward Pass : 使用神經網路進行預測，同時也會計算出
$\frac{\partial z}{\partial w}$
Backward Pass : 向後傳播計算
$\frac{\partial l}{\partial z}$

都算出後就可得到目標值

Step1. Forward Pass

向前傳播其實就是讓神經網路取預測 y
那

\frac{\partial z}{\partial w}

呢 ?

我們將 z 攤開

可以驚訝發現因為

z = x_{1} w_{1} + x_{2} w_{2} + b

所以

\frac{\partial z}{\partial w_{1}} = x_{1}

代表

\frac{\partial z}{\partial w}

的解會是 input

所以 forward 就結束啦 ~

Step2. Backward Pass

因為

\frac{\partial l}{\partial z}

無法直接計算，所以要將它攤開
這時可以考慮 Activation function ，神經網路如下

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

將

\frac{\partial l}{\partial z}

做分解

\frac{\partial l}{\partial z} = \frac{\partial a}{\partial z} \frac{\partial l}{\partial a}

在這邊

\frac{\partial a}{\partial z} = σ^{^{'}} (z)

因為 z 在 forward 就算出來了(不然你 y 怎麼來 ^^ )
所以

σ^{^{'}} (z)

為一常數 (scalar)

我們再將 Activation Func 後面加一層 Layer

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

繼續分解

\frac{\partial l}{\partial a}

\frac{\partial l}{\partial a} = \frac{\partial z^{^{'}}}{\partial a} \frac{\partial l}{\partial z^{^{'}}} + \frac{\partial z^{^{″}}}{\partial a} \frac{\partial l}{\partial z^{^{″}}}

我們細看

\frac{\partial z^{^{'}}}{\partial a}

帶入 z 公式且設 x=a 時，對 a 微分後會剩下

w_{3}

(

z = a w_{3} + a_{j} w_{j} . . .

)

帶回

\frac{\partial l}{\partial z}

\frac{\partial l}{\partial z} = σ^{^{'}} (z) [w_{3} \frac{\partial l}{\partial z^{^{'}}} + w_{4} \frac{\partial l}{\partial z^{^{″}}}]

當此 layer 是輸出層時，便可計算

\frac{\partial l}{\partial z^{^{'}}}

了

\frac{\partial l}{\partial z^{^{'}}} = \frac{\partial y_{1}}{\partial z^{^{'}}} \frac{\partial l}{\partial y_{1}}

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

最後再將上述結果帶回

\frac{\partial l}{\partial z} = σ^{^{'}} (z) [w_{3} \frac{\partial l}{\partial z^{^{'}}} + w_{4} \frac{\partial l}{\partial z^{^{″}}}]

即求出

\frac{\partial l}{\partial z}

有了

\frac{\partial l}{\partial z}

和

\frac{\partial z}{\partial w}

我們就算出了梯度

\frac{\partial l}{\partial w} = \frac{\partial z}{\partial w} \frac{\partial l}{\partial z}

為啥叫 Backward Pass 呢 ?
將公式列出 (這邊舉例計算

z = z_{1}

公式 )

\frac{\partial l}{\partial z_{1}} = σ^{^{'}} (z_{1}) [w_{3} \frac{\partial l}{\partial z_{3}} + w_{4} \frac{\partial l}{\partial z_{4}}]

\frac{\partial l}{\partial z_{3}}

又可展開

\frac{\partial l}{\partial z_{3}} = σ^{^{'}} (z_{3}) [w_{5} \frac{\partial l}{\partial z_{5}} + w_{6} \frac{\partial l}{\partial z_{6}}]

同理

\frac{\partial l}{\partial z_{4}}

並繪製成圖

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

會發現我們是從靠近 y 的參數慢慢計算回來
因為前面參數的偏微分都會需要後面的參數
所以這樣是比較有效率的方法 ( 不然你每次都要重算 TT )
也稱為 Backward Pass ~

tags: ML數學分部

Backpropagation 反向傳播算法

簡介

目標

Forward Pass & Backward Pass

Step1. Forward Pass

Step2. Backward Pass

Read more

拉格朗日乘數 ( Lagrange Multiplie )

Image Processing

研究所推甄

AI For Green

tags: `ML數學分部`