Try   HackMD

機器學習- 神經網路(多層感知機 Multilayer perceptron, MLP) 含倒傳遞( Backward propagation)詳細推導

多層感知機是一種前向傳遞類神經網路,至少包含三層結構(輸入層、隱藏層和輸出層),並且利用到「倒傳遞」的技術達到學習(model learning)的監督式學習,以上是傳統的定義。現在深度學習的發展,其實MLP是深度神經網路(deep neural network, DNN)的一種special case,概念基本上一樣,DNN只是在學習過程中多了一些手法和層數會更多更深。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

假設有個MLP的結構,共有n筆樣本,每個樣本對應m個輸出值。隱藏層只有一層設定為p個hidden node。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

前向傳遞(Forward propagation): 較簡單 (只有線性合成,和非線性轉換)
反向傳遞 (Backward propagation): 較複雜 (因為多微分方程)

前向傳遞 (Forward propagation)

輸入層到隱藏層

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

輸入層到隱藏層的值為sk,k=1,…,p,為輸入訊號的加權線性和(vik為第i個輸入到第k個hidden node的權重)。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

經過 非線性轉換/激活函數(activation function,f1)後,得到hidden node的輸出hk

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

隱藏層到輸出層

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

隱藏層到輸出層的值為zj,j=1,…,m,為hidden node輸出的加權線性和(wkj為第k個hidden node輸出到第j個輸出值的權重)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

經過 非線性轉換/激活函數(activation function,f2)後,得到推估的輸出值y^j

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

反向傳遞 (Backward propagation)

反向傳遞的目的就是利用最後的目標函數(loss/cost function)來進行參數的更新,一般來說都是用誤差均方和(mean square error)當作目標函數。如果誤差值越大,代表參數學得不好,所以需要繼續學習,直到參數或是誤差值收斂。

x^(i)為第i筆資料的輸入值,其輸出值為

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

其目標的誤差為

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

所有樣本的誤差和當作目標函數

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

最佳化的目的就是讓「所有樣本的誤差均方和」越小越好,所以目標是

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

所以要找到最佳參數解(參數只有wkj和vik),最簡單的方式就是微分方程式等於0找解

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

但參數量多無法直接找到唯一解(後面公式有偏微分後的結果,很難直接找到唯一解),所以還是需要依賴gradient descent找最佳解。

假設對gradient descent有基本認識。
利用gradient descent找最佳參數解(參數只有wkj和vik)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

其中η為學習率(learning rate),

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

基本上微分解無法直接算出,因此用chain rule方式,可以更有效得到解,以下針對不同層別的連結算倒傳遞 (只針對一個樣本去計算)

輸出到隱藏層(wkj)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

chain rule:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

隱藏層到輸入層(vik)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

chain rule:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

所以

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

最後把n個樣本所有gradient加起來得到參數的update

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

到這邊倒傳遞也推導完成了,看完有沒有覺得很簡單(這邊只是符號多了一點而已,基本上用到的數學應該沒有很多)。
PS:裡面有一個重點,非線性轉換/激活函數(f1,f2)在倒傳遞時都有微分,所以在選擇激活函數時必須要選擇可微分函數。

結論

MLP神經網路只是在利用gradient descent找最佳參數解
最後帶入MLP內的前向傳遞 (Forward propagation)即可得到最後的預測值。