--- title: mix_precision實做 tags: 實做練習 --- ## Refer to [tensorflow 教材](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/keras/mixed_precision.ipynb) Overflow & Underflow testing of Float 16 ![](https://i.imgur.com/qceNyvm.png =300x) 首先設定的部份,先引入tensorflow(2.1版) 我指定給一號GPU ![Setup](https://i.imgur.com/88FbObc.png) 參考 tensorflow2 的[文檔](https://www.tensorflow.org/guide/gpu)限制我的 GPU **用多少開多少** ``` gpus = tf.config.experimental.list_physical_devices('GPU') if gpus: try: # Currently, memory growth needs to be the same across GPUs for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) logical_gpus = tf.config.experimental.list_logical_devices('GPU') print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs") except RuntimeError as e: # Memory growth must be set before GPUs have been initialized print(e) ``` **或是直接限制某顆只能用多少** ``` gpus = tf.config.experimental.list_physical_devices('GPU') if gpus: # Restrict TensorFlow to only use the first GPU try: tf.config.experimental.set_visible_devices(gpus[0], 'GPU') logical_gpus = tf.config.experimental.list_logical_devices('GPU') print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU") except RuntimeError as e: # Visible devices must be set before GPUs have been initialized print(e) ``` 接著設定 Policy,讓 global dtype變成 mixed precision ``` tf.keras.mixed_precision.experimental.Policy tf.keras.mixed_precision.experimental.set_policy ``` 這裡會出現一個INFO告訴你,你的GPU能不能做混合精度運算 ![](https://i.imgur.com/6JMru38.png) 開始搭建模型,在展開張量圖之後確定一下運算到底使用的是GPU還是CPU ``` inputs = keras.Input(shape=(784,), name='digits') if tf.config.list_physical_devices('GPU'): print('The model will run with 4096 units on a GPU') num_units = 4096 else: # Use fewer units on CPUs so the model finishes in a reasonable amount of time print('The model will run with 64 units on a CPU') num_units = 64 ``` 接著確認在混合精度計算運算使用float16,但儲存variable是使用float32,前面paper有提過為什麼這麼做,這裡就不贅述 ``` print('x.dtype: %s' % x.dtype.name) # 'kernel' is dense1's variable print('dense1.kernel.dtype: %s' % dense1.kernel.dtype.name) ``` ![](https://i.imgur.com/OjmyBNO.png =400x) 接著要修正最後一層的運算,由於最後的output必須是float32的,但因為前面把Global policy都設定成mixed precision,最後一層在softmax上做修正 ![](https://i.imgur.com/qypeXtm.png) 這邊做個實驗比較mix_precision,fully float32,fully float16的差異,由於實驗是跑在GPU1上,所以下面的memory都以GPU1的為主 **GPU0 memory Usage** ![](https://i.imgur.com/7mpNBcK.png =200x) **Mix_precision** Result ![](https://i.imgur.com/7d9r3Jn.png) GPU Memory Usage ![](https://i.imgur.com/AVafIh9.png =200x) **Float32** Global Policy ![](https://i.imgur.com/lZDGzH4.png) Result ![](https://i.imgur.com/Q0UYMSU.png) GPU memory usage ![](https://i.imgur.com/dSbkika.png =200x) **Float16** Result ![](https://i.imgur.com/YJ8Q1YH.png) GPU memory usage ![](https://i.imgur.com/XLz2uHn.png =200x) **Float64** Global Policy ![](https://i.imgur.com/3BEKJYV.png) 這邊稍微改了一下label的設定,改為float64 ![](https://i.imgur.com/nkEIuO3.png) GPU memory usage ![](https://i.imgur.com/3s6Fxwr.png =200x) Result ![](https://i.imgur.com/auypDIr.png) 可以看到速度上純粹float16是最快,但直接訓練不出來,mix_precision速度是float32的兩倍,準確度上沒有差很多(我這裡沒有使用同樣的random seed,epochs數也沒拉高,並且只有各做一次,實驗程序上有點小瑕疵,不過主要是簡單測試速度和顯卡內存上的差異) ## Loss Scaling