2. HUNG-YI LEE 2022 ML - NN Training tips
【機器學習2021】類神經網路訓練不起來怎麼辦 系列(一)~(五)
1. optimization fail
-
使用梯度下降時,當gradiant為0,不一定是卡在local minima,只能說卡在crtical point~
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- local minima旁邊沒路可走,saddle point旁邊有路走?
-
判斷critical point是屬於local minima or saddle point?
- 數學方法:需先了解loss function的形狀
- 給定θ′(at critical point),附近的θ function,也就是L(θ)可以用tayler series approximation來表示:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 計算Hessian矩陣,並透過Hessian來判斷critical point是local minima or saddle point。因for all v測試顯然不實際,因此可用linear algebra的eigenvalue來快速判斷
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
若判斷為saddle point:可透過H解
- 最慢方式,因計算H矩陣eigenvalue很麻煩
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 實際例子:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
總結:根據研究,大部分時候並非卡在local minima
batch size影響
-
big batch size較穩定,small batch size產生較noisy的gradient
-
因GPU paralled運算,大的batch size不會比較差,反而有優勢
-
opt. issue: in training set,過大batch size也不好
- large batch size 過快算到gradient = 0
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
overfitting:small batch size在testing set表現好?
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 解釋(尚待進一步研究)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
各有優劣,是一個超參數
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
momentum
training stuck問題
- 參數透過training不斷更新,gradient不斷下降;但是當loss不再減小,gradient有時候不是最小!。所以並非每次都是卡在citical point(local minima or saddle point)
- 使用gradient descent卡關常常不是因為critical point,而是learing rate問題!永遠走不到終點
learning rate客製化
使用Adaptive learning rate效果
- 問題:累積過多過小的gradient會爆走
- sol:learning rate scheduling
- 加入以下兩種機制之一:
- learning rate decay
- warm up(黑科技)
- learning rate先變大 後變小
- 遠古時期論文即出現:
- 為何如此還需要進一步研究(學界未完善研究),參考RAdam 1908.03265
總結 Optimization
進化的gradient descent
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
classification
-
完整版本:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
本堂課為方便解釋,定義
- y head為正確答案、y為預測答案
- one-hot vector:多維度的n * 1向量,裡面只有0跟1
-
softmax():把y vector contain 各種值,轉換成one-hot vector y promt
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 輸入的y vector稱為logit
- 好處:normalize,以及把值之間差距變大
-
當binary classification時,多用sigmoid
- 思考,其實與softmax 計算binary classification時意義是一樣的
loss of classification
-
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
pyTorch:call cross-entropy時,softmax()會自動被加入到network最後一層,所以TA code中找不到softmax XD
-
cross-entropy
- 常用在classfication 問題
- 是一個loss function
- 用數學證明為何常用在classfication問題上
- 用舉例說明為何常用在classfication問題上
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 左圖卡住的地方,因loss值很大,error surface是非常平坦的,gradient會趨近於零
總結
改變loss function(將error surface鏟平?)可影響optimization時的難易度。
- HW3將會用到batch normalization(CNN)
changing landscape
HW02
Python 中 with用法及原理