---
title: mix_precision實做
tags: 實做練習
---
## Refer to [tensorflow 教材](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/keras/mixed_precision.ipynb)
Overflow & Underflow testing of Float 16

首先設定的部份,先引入tensorflow(2.1版)
我指定給一號GPU

參考 tensorflow2 的[文檔](https://www.tensorflow.org/guide/gpu)限制我的 GPU
**用多少開多少**
```
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
```
**或是直接限制某顆只能用多少**
```
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
# Visible devices must be set before GPUs have been initialized
print(e)
```
接著設定 Policy,讓 global dtype變成 mixed precision
```
tf.keras.mixed_precision.experimental.Policy
tf.keras.mixed_precision.experimental.set_policy
```
這裡會出現一個INFO告訴你,你的GPU能不能做混合精度運算

開始搭建模型,在展開張量圖之後確定一下運算到底使用的是GPU還是CPU
```
inputs = keras.Input(shape=(784,), name='digits')
if tf.config.list_physical_devices('GPU'):
print('The model will run with 4096 units on a GPU')
num_units = 4096
else:
# Use fewer units on CPUs so the model finishes in a reasonable amount of time
print('The model will run with 64 units on a CPU')
num_units = 64
```
接著確認在混合精度計算運算使用float16,但儲存variable是使用float32,前面paper有提過為什麼這麼做,這裡就不贅述
```
print('x.dtype: %s' % x.dtype.name)
# 'kernel' is dense1's variable
print('dense1.kernel.dtype: %s' % dense1.kernel.dtype.name)
```

接著要修正最後一層的運算,由於最後的output必須是float32的,但因為前面把Global policy都設定成mixed precision,最後一層在softmax上做修正

這邊做個實驗比較mix_precision,fully float32,fully float16的差異,由於實驗是跑在GPU1上,所以下面的memory都以GPU1的為主
**GPU0 memory Usage**

**Mix_precision**
Result

GPU Memory Usage

**Float32**
Global Policy

Result

GPU memory usage

**Float16**
Result

GPU memory usage

**Float64**
Global Policy

這邊稍微改了一下label的設定,改為float64

GPU memory usage

Result

可以看到速度上純粹float16是最快,但直接訓練不出來,mix_precision速度是float32的兩倍,準確度上沒有差很多(我這裡沒有使用同樣的random seed,epochs數也沒拉高,並且只有各做一次,實驗程序上有點小瑕疵,不過主要是簡單測試速度和顯卡內存上的差異)
## Loss Scaling