Try   HackMD

Introduction to Deep Learning

contributed by <kylekylehaha>

tags:Data Science

Neruon: neuron network 的最小單位。

若沒有 acitvation function,model 不會變複雜,仍然是 linear.

w0,...,wm 為需要 learned 的參數。
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Universality Theorem 告訴我們只要參數越多,就越能模擬到 function,效果也越好,但為何是用 deep not fat netowrk?

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

因為 deep 可以將每層視為一個個 module,可以將各層的結果疊上去,如果只有一層的話,就需要更多參數來達到相同的 performance。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

除此之外,若有相同數量的 data,用 deep 的方式得到的 performance 比較好。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Task

  • model: 一個 function set。
  • goal: 利用 training data,從 hypotheis function set 找出最適合 task 的 best function
    f

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

分成 3 steps:

  1. What is the model?(function hypothesis set)
  2. What is the "best" function?
  3. How to pick the "best" function?

Task considered today (將問題 model 的方式)
Classification

  • Binary classification (only two class)
    • Spam filtering
    • Recommendation system
    • Malware detection
    • stock prediction
  • Multi-class classification (morn than two class)
    • Handwriting digit classification
    • Image recognition

What is the model?

A layer of neuron

Single neuron

  • Only do binary classification, cannot handle multi-class classification
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

So, we use multiple neuron to do multi-class classification.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Limitation of single layer

Single neuron 可以視為一條直線,因此不論怎麼畫(切),都無法有效區分 XOR function

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

我們可以利用邏輯閘的概念,透過 AND OR 來達成 XOR

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

也就是說,原本切不開的,我們可以多疊幾個 neuron,來做 transformation,投影到高維向量(這裡一樣是二維,只是有將

x1,x2 轉成
a1,a2
)。轉成
a1,a2
後就能透過一條線切割了。


Neural Network

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Notation

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Relation between Layer Output

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

像 linear regression 這種 model,因為 parameters 較少,故無法做比較難的任務。


What is the "best" function?

找出 best function,猶如找出 best parameters。因此會將

f(x) 寫成
f(x;θ)
θ={W1,b1,W2,b2,...WL,bL}

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Cost function

利用 cost function 來當作 "好不好" 的依據。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • xr
    : 第 r 個 training set
  • yr^
    : 第 r 個 ground truth

xr 通過 f() 後,output 一個 vector。該 vector 表示是 "1" 的機率 ; 是 "2" 的機率


How to pick the "best" function?

對 deep learning 來說, function 是固定的,差別在於參數的不同。故我們希望取得一組參數,使得在 training data 上,它的 training loss 最小。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Gradient Descent

若我們知道 function 長得如何,可以直接用微分找出極值點。但一般來說我們不會知道 function 長得如何,因此採用 Gradient Descent

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

計算出該點的 gradient,往 gradient 的反方向走,最後走到 local minimum。


Formal Derivation of Gradient Descent

我們也可以利用 Taylor Series 來證明 gradient descent。

  • Taylor Series: 在特定點時,可以將點展開,使得值很接近。
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    可以看到
    x=π4
    時,其值基本上一樣。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

圈圈夠小時,可以將 (a,b) 這點展開。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

為了有效計算 gradient,我們會採用 back propagation


Forward Data & Backward Error

一開始,會利用 forward data 算出最終的 function 後,找到 final error。接著將 error 根據 weight 傳回去,最後再更新 parameter。

Forward Data

Calculate Error

Error Backpropagation



Update weights




Practical Issues for neural network

  • Parameter Initialization
  • Learning Rate
  • Stochastic gradient descent and Mini-batch
  • Recipe for Learning

Parameter Initialziation


Learning Rate



Stochastic gradient descent and Mini-batch

必須考慮是否每個 data 的 gradient descent 是否一樣,若一樣則成立可以用 stochastic gradient descent。

Gradient descent: 一個 epoch 更新一次 ; Stochastic gradient descent: 一筆資料更新一次,若有 20 筆資料則更新 20 次。故兩者比較可知都更新一個 epoch 時,stochastic 已經跑很遠了。

Stochastic 即為 batch size = 1。

沒有一定說 batch size 是多少就會有比較好的 accuracy or training time,屬於 hyperparameter,要透過實驗去看。


Recipe for Learning


Concluding Remark


Tips for Deep Neural Network

  • Acitvation Function
  • Cost Function
  • Data Preprocessing
  • Optimization
  • Generalization

Acitvation Function

現今我們常用 ReLU 來當作 activation function,而非 sigmoid。因為 ReLU 微分後的結果不是 0 就是 1 ,而 sigmoid 的微分值介於 0~1 之間,永遠小於1。這樣在做 back propagation 時,微分值乘上 weight 後會變小,導致越傳 gradient 越小,造成 vanishing gradient problem







Cost Function

透過 softmax,將結果輸出在 0~1 之間,總和為 1。



Optimization

  • Learning Rate
  • Momentum

Learning Rate



Momentum

解決容易卡在 local optimium 的問題。

除了計算 gradient 之外,多計算 velocity v。


Generalization

Dropout