---
title: Tensorflow GPU
---
<style>
div.center > table { font-size:90%; margin-left: 5em; margin-right:5em }
pre, code { font-family: 'Courier New' !important; }
th { text-align:center!important; background-color: gray; color: white }
th, td { margin: 0; padding: 0.25em !important }
td code { padding: 0 !important; background-color: transparent !important }
pre { margin:0 !important; padding: 0.5em !important; border-radius: 5px!important }
pre#python_out code { white-space:pre-wrap!important }
img { box-shadow: 4px 4px 8px }
img + p { margin-top:0.5em!important; text-align:center }
</style>
在 Windows 平台上啟用 GPU 加速 Tensorflow
===
# 使用 Windows 原生環境
## 適用環境:
* Python for Windows
* Python 版本 **3.8.x** (3.8.6, 3.8.7 tested)
* 3.9.0, 3.9.1 目前 tensorflow 不支援, 無法完成 tensorflow 安裝
* 安裝在 Python for Windows 原生橂組 venv 開設的 python **虛擬環境**
* 其他**非 andconda** for Windows 開設的 python 虛擬環境
* 安裝步驟
1. 安裝 Microsoft Visual C++ 開發工具及執行環境
* **執行環境**: "Microsoft Visual VC++ 2015-2019 redistributable" (Visual Studio 2019 內有包含)
* **開發工具**: MS **Visual Studio** (**community** 版即可)
CUDA Toolkit 內有 **Visual Studio** 的 C++ 擴充套件. 安裝 Visual Studio 時請選 '**使用 VC++ 桌面開發**'.
<div style='width:90%; margin:auto;'>
<img src="https://pic.pimg.tw/magicjackting/1611133202-2539863528-g_l.png" title="MS VC++.png" alt="MS VC++.png" />
<p>安裝時請選擇**使用 VC++ 桌面開發**</p>
</div>
:::warning
如果有要用 CUDA Toolkit 開發 C++ 程式, 請勿只裝 "Build Tools for Visual Studio" (MSBuild). 獨立安裝的 MSBuild 和附加在 Visual C++ 的 MSBuild 安裝路徑不同, 不熟的人要再把它們重新整合好會有些難度.
:::
2. 安裝 python package **tensorflow**
請於需求的環境中, 以 `pip install` 安裝.
```bash
pip install tensorflow
```
:::danger
Anaconda, Miniconda 的使用者請不要用 `pip install` 安裝, 請務必改用 `conda install`
請參看 **`使用 Anaconda for Windows 環境`** 這個段落.
:::
:::warning
**請注意** tensorflow 2.x:
1. 已經將 Keras 整合至 tensorflow 裡, 原先獨立版本的 keras 已經**不再繼續開發及支援 (解 bug)**.
2. 已經將 CPU 及 GPU 的版本整在一起, 只需安裝 tensorflow 即可, 無需安裝 tensorflow-gpu.
3. 這一步驟次序無關, 可以先裝, 也可以先跳過, 驗證是否安裝正確時再裝.
:::
3. 安裝 nVidia Display Driver
依自己的 nVidia GPU 型號下載並安裝 GPU driver.
網址: https://www.nvidia.com.tw/Download/index.aspx?lang=tw
:::info
nVidia CUDA Toolkit 安裝包裡已經包含一個**相對新的相容版本**的 nVidia Display Driver. 所以一般這個步驟在 windows 平台上可裝可不裝. (可以最後再決定裝不裝或者以後再安裝更新的版本).
:::
* [查看最新狀況](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions)
<p style='text-align:center; font-weight:bold; font-size:1.2em'>CUDA Toolkit and Compatible Driver Versions</p>
<div class='center'>
CUDA Toolkit | Linux x86_64<br />Driver Version | Windows x86_64<br />Driver Version
---|:---|:---
CUDA 11.2.0 GA | >=460.27.04 | >=460.89
CUDA 11.1.1 Update 1 | >=455.32 | >=456.81
CUDA 11.1 GA | >=455.23 | >=456.38
CUDA 11.0.3 Update 1 | >= 450.51.06 | >= 451.82
CUDA 11.0.2 GA | >= 450.51.05 | >= 451.48
CUDA 11.0.1 RC | >= 450.36.06 | >= 451.22
CUDA 10.2.89 | >= 440.33 | >= 441.22
CUDA 10.1 (10.1.105 general release, and updates) | >= 418.39 | >= 418.96
CUDA 10.0.130 | >= 410.48 | >= 411.31
CUDA 9.2 (9.2.148 Update 1) | >= 396.37 | >= 398.26
CUDA 9.2 (9.2.88) | >= 396.26 | >= 397.44
CUDA 9.1 (9.1.85) | >= 390.46 | >= 391.29
CUDA 9.0 (9.0.76) | >= 384.81 | >= 385.54
CUDA 8.0 (8.0.61 GA2) | >= 375.26 | >= 376.51
CUDA 8.0 (8.0.44) | >= 367.48 | >= 369.30
CUDA 7.5 (7.5.16) | >= 352.31 | >= 353.66
CUDA 7.0 (7.0.28) | >= 346.46 | >= 347.62
</div>
4. 安裝 **nVidia CUDA Toolkit**
* 查閱 Tensorflow 與 CUDA Toolkit 版本相容性
網址:
* https://www.tensorflow.org/install/source_windows#gpu
* https://docs.nvidia.com/cuda/archive/11.1.1/cuda-toolkit-release-notes/index.html#title-new-features
* [查看最新狀況](https://www.tensorflow.org/install/source_windows#gpu)
<p style='text-align:center; font-weight:bold; font-size:1.2em'>Tensorflow GPU 版 Windows 平台 版本配對關係</p>
<div class='center'>
Tensorflow<br />Version | Python<br />version | Compiler | Build tools | cuDNN | CUDA
---|:---:|---|---|:---:|:---:
tensorflow_gpu-2.4.0 | 3.6-3.8 | MSVC 2019 | Bazel 3.1.0 | 8.0 | 11.0
tensorflow_gpu-2.3.0 | 3.5-3.8 | MSVC 2019 | Bazel 3.1.0 | 7.6 | 10.1
tensorflow_gpu-2.2.0 | 3.5-3.8 | MSVC 2019 | Bazel 2.0.0 | 7.6 | 10.1
tensorflow_gpu-2.1.0 | 3.5-3.7 | MSVC 2019 | Bazel 0.27.1-0.29.1 | 7.6 | 10.1
tensorflow_gpu-2.0.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.26.1 | 7.4 | 10
tensorflow_gpu-1.15.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.26.1 | 7.4 | 10
tensorflow_gpu-1.14.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.24.1-0.25.2 | 7.4 | 10
tensorflow_gpu-1.13.0 | 3.5-3.7 | MSVC 2015 update 3 | Bazel 0.19.0-0.21.0 | 7.4 | 10
tensorflow_gpu-1.12.0 | 3.5-3.6 | MSVC 2015 update 3 | Bazel 0.15.0 | 7.2 | 9.0
tensorflow_gpu-1.11.0 | 3.5-3.6 | MSVC 2015 update 3 | Bazel 0.15.0 | 7 | 9
tensorflow_gpu-1.10.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9
tensorflow_gpu-1.9.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9
tensorflow_gpu-1.8.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9
tensorflow_gpu-1.7.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9
tensorflow_gpu-1.6.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9
tensorflow_gpu-1.5.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9
tensorflow_gpu-1.4.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 6 | 8
tensorflow_gpu-1.3.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 6 | 8
tensorflow_gpu-1.2.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8
tensorflow_gpu-1.1.0 | 3.5 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8
tensorflow_gpu-1.0.0 | 3.5 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8
</div>
* 選擇下載並安裝**相容**的最新版本
我在安裝時 CUDA 最新版是 11.2.0, 但因為 cuDNN 還沒有對應的版本, 所以退一版改用 **`CUDA Toolkit 11.1.1`**
下載網址:
* https://developer.nvidia.com/cuda-toolkit-archive
* https://developer.nvidia.com/cuda-downloads
* 安裝時選用 "**自訂**", 然後不安裝 "**NVIDIA GeForece Experience**"
* 或者選用 "**快速**" 安裝也行.
5. 安裝 **nVidia cuDNN SDK**
下載網址: https://developer.nvidia.com/CUDNN
* 需要另外填寫一些資料,
* 下載對應 CUDA Toolkit 的 windows 版本.
* 解開 ZIP 檔, 並將三個子目錄 copy 到 CUDA Toolkit 對應的子目錄中.
* bin <--> bin
* lib <--> lib
* include <--> include
6. 檢查環境變數
* CUDA_PATH 是否設為正確的安裝路徑.
* CUDA_PATH_Vxx_y 是否設為正確的安裝路徑.
* 重新開機 (必需)
7. 驗證安裝
* 開啟 python 或 jupyter 網頁, 輸入以下程式:
```python
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
```
* 如果有錯誤訊息 (**時間戳記 後是 `W` 而不是 `I`**)
```plaintext
W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
```
* 解法一: 請依名字 (`cusolver64_10.dll`) 到 `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin` (或對應的版本中) 找尋**名稱相似**的檔案`cusolver64_11.dll`, 將之**改名**即可 (用**複製**的也行)
* 解法二: 請按連結 [下載 `cusolver64_10.dll`](https://drive.google.com/file/d/1-3Yk-QZ1eUta1T40BaxFpO4uyn7BPU4o) 放到 `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin`
* Ref: tensorflow bug report [#44291](https://github.com/tensorflow/tensorflow/issues/44291), [#44159](https://github.com/tensorflow/tensorflow/issues/44159)
python 訊息例: (最下面有看到 **GPU** 就對了)
```plaintext{ #python_out }
(vGPU) C:\Works\OpenCV>python
Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
2021-01-20 10:20:34.772345: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
>>>
>>> device_lib.list_local_devices()
2021-01-20 10:20:40.590963: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-20 10:20:40.594219: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-20 10:20:41.256771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: GeForce MX330 computeCapability: 6.1
coreClock: 1.594GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 52.21GiB/s
2021-01-20 10:20:41.257120: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-20 10:20:41.264409: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-20 10:20:41.264607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-20 10:20:41.268513: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-20 10:20:41.270228: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-20 10:20:41.278077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-20 10:20:41.282008: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-20 10:20:41.282499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-20 10:20:42.142988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-20 10:20:42.143236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-01-20 10:20:42.143350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-01-20 10:20:42.144514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 1342 MB memory) -> physical GPU (device: 0, name: GeForce MX330, pci bus id: 0000:2d:00.0, compute capability: 6.1)
2021-01-20 10:20:42.146561: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5028390217096499099
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 1408043828
locality {
bus_id: 1
links {
}
}
incarnation: 11065352565844271525
physical_device_desc: "device: 0, name: GeForce MX330, pci bus id: 0000:2d:00.0, compute capability: 6.1"
]
>>> exit()
```
<br />
---
# 使用 Anaconda for Windows 環境
* Anaconda 環境目前有支援 GPU 的環境如下表
項次 |python | Tensorflow | cuda Toolkit | cuDNN
:-:|:--:|:---:|:---:|:---:
1 | 3.6 | 1.9 | 9.0 | 7.0.3 ~ 7.6.5
2 | 3.6 | 1.13~1.15 | 10.0.130 | 7.3.0 ~ 7.6.5
3 | 3.7 | 1.13~1.15 | 10.0.130 | 7.3.0 ~ 7.6.5
4 | 3.7 | 2.1 | 10.1.243 | 7.5.0 ~ 7.6.5
5 | 3.8 | NA | NA | NA
* 安裝指令
```cmd
conda install tensorflow-gpu==1.15
# or
conda install tensorflow==2.1
```
請注意安裝時是否列出的套件有包含 `cudatoolkit` 及 `cudnn`
* 請勿使用 `pip install` 安裝, 它不包含 `cudatoolkit` 及 `cudnn`
* 驗證安裝
* 開啟 python 或 jupyter 網頁
```python
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
```
<br />
---
# WSL2 + docker 環境
自 **Docker Desktop** 3.1.0 版起, 開始在 NVIDIA GPUs 上支援 **WSL 2 GPU Paravirtualization (GPU-PV)**.
## 系統需求
* **Docker Desktop** 升級至 3.1.0 版 (或以上)
* Windows 10 **Insider version** (build 20150 以後). 目前為 2004 build ~~21292~~ 21301 (**不是**己經 release 的 **20H2** 喔)
<div style='width:90%; margin:auto'>
<img src="https://pic.pimg.tw/magicjackting/1611198471-2120743177-g.png" border="0" title="windows 版本.png" alt="windows 版本.png" />
<p>目前 Windows 10 Insider 版本 (2021/01/21)</p>
<img src="https://pic.pimg.tw/magicjackting/1611399841-376457745-g.png" border="0" title="windows 版本-1.png" alt="windows 版本-1.png" />
<p>目前 Windows 10 Insider 版本 (2021/01/23)</p>
</div>
* [Beta Display Drivers](https://developer.nvidia.com/cuda/wsl) from NVIDIA supporting WSL 2 GPU Paravirtualization
* 需要登錄為 nVidia 開發者, 並登入才能下載
* 目前 GeForce 版本 465.21
* 更新 WSL 2 **Linux kernel** 至最新版本.
請以**系統管理者身份**開啟 cmd 指令視窗用以下指令 **`wsl --update`** 更新
```cmd
wsl --update
```
更新完成後, 需要重啟 WSL. (以及 docker, 因為 docker 也有二個 wsl images)
```cmd
wsl --shutdown
```
建議是重新啟動 PC 比較乾脆一點.
* 確定 **WSL 2 backend** 選項在 **Docker Desktop** 有開啟
<div style='width:90%; margin:auto'>
<img src="https://pic.pimg.tw/magicjackting/1611199949-7559806-g_l.png" title="Enable WSL 2 backend.png" alt="Enable WSL 2 backend.png" />
<p>啟用 WSL 2 backend</p>
</div>
另外如果 linux `distro` 要使用 `docker` (含 `docker-compose`) 指令可以不必安裝, 只要**啟用**整體的 **WSL Integration** 或者在有需要的 `distro` 上個別**啟用**.
<div style='width:90%; margin:auto'>
<img src="https://pic.pimg.tw/magicjackting/1611199949-3572841595-g_l.png" title="Eanble WSL Integration.png" alt="Eanble WSL Integration.png" />
<p>啟用 WSL Integration (WSL整合)</p>
</div>
順便一提, 這個 `docker`, `docker-compose` 指令在 `git for windows` 的 `git-bash` 環境中一樣可以執行.
## 驗證
* 請於 `cmd` console 視窗中執行
```cmd
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
```
理所當然的, 如果你想在 WSL 的 `distro` 中執行也是可以的.
```cmd
wsl bash
```
進入之後 (我試過 `Ububtu.20-04`, `CentOS7`, `CentOS8` 都 OK) 一樣的指令:
```bash
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
```
* 執行結果如下:
```plaintext
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce MX330" with compute capability 6.1
> Compute 6.1 CUDA device: [GeForce MX330]
3072 bodies, total time for 10 iterations: 2.427 ms
= 38.888 billion interactions per second
= 777.752 single-precision GFLOP/s at 20 flops per interaction
```
<br />
請注意你的 GPU 卡是否有列在最後面的資訊出現 (上例為: GeForce MX330)
:::danger
**WSL2 + docker** 環境不適用 tensorflow 網站所發佈的 docker image **tensorflow/tensorflow**, 請改往 https://ngc.nvidia.com/catalog/containers 下載你需要的 docker images
:::
<br />
---
# 目前還不支援的環境
* VM 環境
* VM + docker 環境
以上 VM 環境指 VirtualBox 或 VMWare WS Pro
###### tags: `Windows`, `WSL2`, `tensorflow`, `GPU 加速`, `python`, `Python for Windows`, `docker for windows`