Introduction to Deep Learning
contributed by <kylekylehaha
>
Neruon: neuron network 的最小單位。
若沒有 acitvation function,model 不會變複雜,仍然是 linear.
為需要 learned 的參數。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Universality Theorem 告訴我們只要參數越多,就越能模擬到 function,效果也越好,但為何是用 deep not fat netowrk?
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
因為 deep 可以將每層視為一個個 module,可以將各層的結果疊上去,如果只有一層的話,就需要更多參數來達到相同的 performance。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
除此之外,若有相同數量的 data,用 deep 的方式得到的 performance 比較好。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Task
- model: 一個 function set。
- goal: 利用 training data,從 hypotheis function set 找出最適合 task 的 best function
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
分成 3 steps:
- What is the model?(function hypothesis set)
- What is the "best" function?
- How to pick the "best" function?
Task considered today (將問題 model 的方式)
Classification
- Binary classification (only two class)
- Spam filtering
- Recommendation system
- Malware detection
- stock prediction
- Multi-class classification (morn than two class)
- Handwriting digit classification
- Image recognition
What is the model?
A layer of neuron
Single neuron
- Only do binary classification, cannot handle multi-class classification
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
So, we use multiple neuron to do multi-class classification.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Limitation of single layer
Single neuron 可以視為一條直線,因此不論怎麼畫(切),都無法有效區分 XOR function
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
我們可以利用邏輯閘的概念,透過 AND OR 來達成 XOR
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
也就是說,原本切不開的,我們可以多疊幾個 neuron,來做 transformation,投影到高維向量(這裡一樣是二維,只是有將 轉成 )。轉成 後就能透過一條線切割了。
Neural Network
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Notation
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Relation between Layer Output
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
像 linear regression 這種 model,因為 parameters 較少,故無法做比較難的任務。
What is the "best" function?
找出 best function,猶如找出 best parameters。因此會將 寫成 ,
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Cost function
利用 cost function 來當作 "好不好" 的依據。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- : 第 r 個 training set
- : 第 r 個 ground truth
通過 f() 後,output 一個 vector。該 vector 表示是 "1" 的機率 ; 是 "2" 的機率 …
How to pick the "best" function?
對 deep learning 來說, function 是固定的,差別在於參數的不同。故我們希望取得一組參數,使得在 training data 上,它的 training loss 最小。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Gradient Descent
若我們知道 function 長得如何,可以直接用微分找出極值點。但一般來說我們不會知道 function 長得如何,因此採用 Gradient Descent
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
計算出該點的 gradient,往 gradient 的反方向走,最後走到 local minimum。
我們也可以利用 Taylor Series 來證明 gradient descent。
- Taylor Series: 在特定點時,可以將點展開,使得值很接近。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
可以看到 時,其值基本上一樣。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
圈圈夠小時,可以將 (a,b) 這點展開。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
為了有效計算 gradient,我們會採用 back propagation
Forward Data & Backward Error
一開始,會利用 forward data 算出最終的 function 後,找到 final error。接著將 error 根據 weight 傳回去,最後再更新 parameter。

Forward Data

Calculate Error

Error Backpropagation



Update weights



Practical Issues for neural network
- Parameter Initialization
- Learning Rate
- Stochastic gradient descent and Mini-batch
- Recipe for Learning
Parameter Initialziation

Learning Rate


Stochastic gradient descent and Mini-batch

必須考慮是否每個 data 的 gradient descent 是否一樣,若一樣則成立可以用 stochastic gradient descent。


Gradient descent: 一個 epoch 更新一次 ; Stochastic gradient descent: 一筆資料更新一次,若有 20 筆資料則更新 20 次。故兩者比較可知都更新一個 epoch 時,stochastic 已經跑很遠了。

Stochastic 即為 batch size = 1。

沒有一定說 batch size 是多少就會有比較好的 accuracy or training time,屬於 hyperparameter,要透過實驗去看。
Recipe for Learning



Tips for Deep Neural Network
- Acitvation Function
- Cost Function
- Data Preprocessing
- Optimization
- Generalization
Acitvation Function
現今我們常用 ReLU 來當作 activation function,而非 sigmoid。因為 ReLU 微分後的結果不是 0 就是 1 ,而 sigmoid 的微分值介於 0~1 之間,永遠小於1。這樣在做 back propagation 時,微分值乘上 weight 後會變小,導致越傳 gradient 越小,造成 vanishing gradient problem。






Cost Function
透過 softmax,將結果輸出在 0~1 之間,總和為 1。


Optimization
Learning Rate


Momentum
解決容易卡在 local optimium 的問題。

除了計算 gradient 之外,多計算 velocity v。

Generalization
Dropout







