contributed by < weian312
>
混合精度訓練
2018 ICLR
Sharan et al.(Baidu)
, Paulius et al.(NVIDIA)
Half-Precision (FP16, binary16)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
最小正尾數
Implementation
那這裡就直接從 implement 開始說吧!
首先複習一下深度學習訓練
- 結構與前傳遞
類神經網路的結構分為
1.Weights & Bias
2.Activations
前向傳遞的過程, 輸入的數據會乘上 weights 加上bias 再經過 Activations 得出 output, 將output 套入 loss function 與正確標籤計算出
- 倒傳遞
從 Loss 經過 chain rule 計算出每個節點的偏微分 (或梯度) , 選擇一個 Optimizer(eg. SGD) 來更新權重
SGD(stochastic gradient decent):
FP32 MASTER COPY OF WEIGHTS
先上圖, 其實這張說明得很清楚
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 權重和梯度, Activations 都用半精度儲存, 將權重保留一份單精度的副本
- 前傳遞與倒傳遞的計算過程都用半精度計算,唯有在倒傳遞最後一步更新權重(算好梯度丟進Optimizer那步)時使用單精度,並保存成新的單精度權重。
這裡的解釋有二
- 因為梯度還要乘上學習率(例如乘上,降了四個數量級),半精度最小正數只能吃到(變成零直接沒更新到XD)
- 權重相對梯度的值太大, 如果是用半精度做權重更新(就是沒存單精度的weight),在計算上一樣會變蛋(原文有更清楚的解釋,有興趣的可以去看)
原文2-a有無存單精度權重的混合精度訓練
(dev0是validation set)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
原文2-b在訓練過程會變蛋的值
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Tensorflow Guide
Paper