在 Windows 平台上啟用 GPU 加速 Tensorflow

使用 Windows 原生環境

適用環境:

Python for Windows
- Python 版本 3.8.x (3.8.6, 3.8.7 tested)
- 3.9.0, 3.9.1 目前 tensorflow 不支援, 無法完成 tensorflow 安裝
- 安裝在 Python for Windows 原生橂組 venv 開設的 python 虛擬環境
- 其他非 andconda for Windows 開設的 python 虛擬環境
安裝步驟

安裝 Microsoft Visual C++ 開發工具及執行環境
- 執行環境: "Microsoft Visual VC++ 2015-2019 redistributable" (Visual Studio 2019 內有包含)
- 開發工具: MS Visual Studio (community 版即可)
  CUDA Toolkit 內有 Visual Studio 的 C++ 擴充套件. 安裝 Visual Studio 時請選 '使用 VC++ 桌面開發'.
Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
安裝時請選擇**使用 VC++ 桌面開發**
如果有要用 CUDA Toolkit 開發 C++ 程式, 請勿只裝 "Build Tools for Visual Studio" (MSBuild). 獨立安裝的 MSBuild 和附加在 Visual C++ 的 MSBuild 安裝路徑不同, 不熟的人要再把它們重新整合好會有些難度.
安裝 python package tensorflow
請於需求的環境中, 以 pip install 安裝.
```
pip install tensorflow
```
Anaconda, Miniconda 的使用者請不要用 pip install 安裝, 請務必改用 conda install
請參看 使用 Anaconda for Windows 環境 這個段落.
請注意 tensorflow 2.x:
1. 已經將 Keras 整合至 tensorflow 裡, 原先獨立版本的 keras 已經不再繼續開發及支援 (解 bug).
2. 已經將 CPU 及 GPU 的版本整在一起, 只需安裝 tensorflow 即可, 無需安裝 tensorflow-gpu.
3. 這一步驟次序無關, 可以先裝, 也可以先跳過, 驗證是否安裝正確時再裝.
安裝 nVidia Display Driver
依自己的 nVidia GPU 型號下載並安裝 GPU driver.
網址: https://www.nvidia.com.tw/Download/index.aspx?lang=tw

nVidia CUDA Toolkit 安裝包裡已經包含一個相對新的相容版本的 nVidia Display Driver. 所以一般這個步驟在 windows 平台上可裝可不裝. (可以最後再決定裝不裝或者以後再安裝更新的版本).
- 查看最新狀況

CUDA Toolkit and Compatible Driver Versions

CUDA Toolkit	Linux x86_64 Driver Version	Windows x86_64 Driver Version
CUDA 11.2.0 GA	>=460.27.04	>=460.89
CUDA 11.1.1 Update 1	>=455.32	>=456.81
CUDA 11.1 GA	>=455.23	>=456.38
CUDA 11.0.3 Update 1	>= 450.51.06	>= 451.82
CUDA 11.0.2 GA	>= 450.51.05	>= 451.48
CUDA 11.0.1 RC	>= 450.36.06	>= 451.22
CUDA 10.2.89	>= 440.33	>= 441.22
CUDA 10.1 (10.1.105 general release, and updates)	>= 418.39	>= 418.96
CUDA 10.0.130	>= 410.48	>= 411.31
CUDA 9.2 (9.2.148 Update 1)	>= 396.37	>= 398.26
CUDA 9.2 (9.2.88)	>= 396.26	>= 397.44
CUDA 9.1 (9.1.85)	>= 390.46	>= 391.29
CUDA 9.0 (9.0.76)	>= 384.81	>= 385.54
CUDA 8.0 (8.0.61 GA2)	>= 375.26	>= 376.51
CUDA 8.0 (8.0.44)	>= 367.48	>= 369.30
CUDA 7.5 (7.5.16)	>= 352.31	>= 353.66
CUDA 7.0 (7.0.28)	>= 346.46	>= 347.62

安裝 nVidia CUDA Toolkit

查閱 Tensorflow 與 CUDA Toolkit 版本相容性
網址:

Tensorflow GPU 版 Windows 平台版本配對關係

Tensorflow Version	Python version	Compiler	Build tools	cuDNN	CUDA
tensorflow_gpu-2.4.0	3.6-3.8	MSVC 2019	Bazel 3.1.0	8.0	11.0
tensorflow_gpu-2.3.0	3.5-3.8	MSVC 2019	Bazel 3.1.0	7.6	10.1
tensorflow_gpu-2.2.0	3.5-3.8	MSVC 2019	Bazel 2.0.0	7.6	10.1
tensorflow_gpu-2.1.0	3.5-3.7	MSVC 2019	Bazel 0.27.1-0.29.1	7.6	10.1
tensorflow_gpu-2.0.0	3.5-3.7	MSVC 2017	Bazel 0.26.1	7.4	10
tensorflow_gpu-1.15.0	3.5-3.7	MSVC 2017	Bazel 0.26.1	7.4	10
tensorflow_gpu-1.14.0	3.5-3.7	MSVC 2017	Bazel 0.24.1-0.25.2	7.4	10
tensorflow_gpu-1.13.0	3.5-3.7	MSVC 2015 update 3	Bazel 0.19.0-0.21.0	7.4	10
tensorflow_gpu-1.12.0	3.5-3.6	MSVC 2015 update 3	Bazel 0.15.0	7.2	9.0
tensorflow_gpu-1.11.0	3.5-3.6	MSVC 2015 update 3	Bazel 0.15.0	7	9
tensorflow_gpu-1.10.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	7	9
tensorflow_gpu-1.9.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	7	9
tensorflow_gpu-1.8.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	7	9
tensorflow_gpu-1.7.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	7	9
tensorflow_gpu-1.6.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	7	9
tensorflow_gpu-1.5.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	7	9
tensorflow_gpu-1.4.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	6	8
tensorflow_gpu-1.3.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	6	8
tensorflow_gpu-1.2.0	3.5-3.6	MSVC 2015 update 3	Cmake v3.6.3	5.1	8
tensorflow_gpu-1.1.0	3.5	MSVC 2015 update 3	Cmake v3.6.3	5.1	8
tensorflow_gpu-1.0.0	3.5	MSVC 2015 update 3	Cmake v3.6.3	5.1	8

選擇下載並安裝相容的最新版本
我在安裝時 CUDA 最新版是 11.2.0, 但因為 cuDNN 還沒有對應的版本, 所以退一版改用 CUDA Toolkit 11.1.1
下載網址:
- https://developer.nvidia.com/cuda-toolkit-archive
- https://developer.nvidia.com/cuda-downloads
安裝時選用 "自訂", 然後不安裝 "NVIDIA GeForece Experience"
或者選用 "快速" 安裝也行.

安裝 nVidia cuDNN SDK
下載網址: https://developer.nvidia.com/CUDNN
- 需要另外填寫一些資料,
- 下載對應 CUDA Toolkit 的 windows 版本.
- 解開 ZIP 檔, 並將三個子目錄 copy 到 CUDA Toolkit 對應的子目錄中.
  - bin <–> bin
  - lib <–> lib
  - include <–> include
檢查環境變數
- CUDA_PATH 是否設為正確的安裝路徑.
- CUDA_PATH_Vxx_y 是否設為正確的安裝路徑.
- 重新開機 (必需)

驗證安裝

開啟 python 或 jupyter 網頁, 輸入以下程式:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

如果有錯誤訊息 (時間戳記後是 W 而不是 I)
```
W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
```
- 解法一: 請依名字 (cusolver64_10.dll) 到 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin (或對應的版本中) 找尋名稱相似的檔案cusolver64_11.dll, 將之改名即可 (用複製的也行)
- 解法二: 請按連結下載 cusolver64_10.dll 放到 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin
- Ref: tensorflow bug report #44291, #44159

python 訊息例: (最下面有看到 GPU 就對了)

(vGPU) C:\Works\OpenCV>python
Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
2021-01-20 10:20:34.772345: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
>>>
>>> device_lib.list_local_devices()
2021-01-20 10:20:40.590963: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-20 10:20:40.594219: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-20 10:20:41.256771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: GeForce MX330 computeCapability: 6.1
coreClock: 1.594GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 52.21GiB/s
2021-01-20 10:20:41.257120: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-20 10:20:41.264409: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-20 10:20:41.264607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-20 10:20:41.268513: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-20 10:20:41.270228: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-20 10:20:41.278077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-20 10:20:41.282008: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-20 10:20:41.282499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-20 10:20:42.142988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-20 10:20:42.143236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-01-20 10:20:42.143350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-01-20 10:20:42.144514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 1342 MB memory) -> physical GPU (device: 0, name: GeForce MX330, pci bus id: 0000:2d:00.0, compute capability: 6.1)
2021-01-20 10:20:42.146561: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5028390217096499099
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 1408043828
locality {
bus_id: 1
links {
}
}
incarnation: 11065352565844271525
physical_device_desc: "device: 0, name: GeForce MX330, pci bus id: 0000:2d:00.0, compute capability: 6.1"
]
>>> exit()

使用 Anaconda for Windows 環境

Anaconda 環境目前有支援 GPU 的環境如下表

項次	python	Tensorflow	cuda Toolkit	cuDNN
1	3.6	1.9	9.0	7.0.3 ~ 7.6.5
2	3.6	1.13~1.15	10.0.130	7.3.0 ~ 7.6.5
3	3.7	1.13~1.15	10.0.130	7.3.0 ~ 7.6.5
4	3.7	2.1	10.1.243	7.5.0 ~ 7.6.5
5	3.8	NA	NA	NA

安裝指令

conda install tensorflow-gpu==1.15
# or
conda install tensorflow==2.1

請注意安裝時是否列出的套件有包含 cudatoolkit 及 cudnn

請勿使用 pip install 安裝, 它不包含 cudatoolkit 及 cudnn

驗證安裝

開啟 python 或 jupyter 網頁

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

WSL2 + docker 環境

自 Docker Desktop 3.1.0 版起, 開始在 NVIDIA GPUs 上支援 WSL 2 GPU Paravirtualization (GPU-PV).

系統需求

Docker Desktop 升級至 3.1.0 版 (或以上)
Windows 10 Insider version (build 20150 以後). 目前為 2004 build ~~21292~~ 21301 (不是己經 release 的 20H2 喔)
Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
目前 Windows 10 Insider 版本 (2021/01/21)
Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
目前 Windows 10 Insider 版本 (2021/01/23)
Beta Display Drivers from NVIDIA supporting WSL 2 GPU Paravirtualization
- 需要登錄為 nVidia 開發者, 並登入才能下載
- 目前 GeForce 版本 465.21
更新 WSL 2 Linux kernel 至最新版本.
請以系統管理者身份開啟 cmd 指令視窗用以下指令 wsl --update 更新
```
wsl --update
```
更新完成後, 需要重啟 WSL. (以及 docker, 因為 docker 也有二個 wsl images)
```
wsl --shutdown 
```
建議是重新啟動 PC 比較乾脆一點.
確定 WSL 2 backend 選項在 Docker Desktop 有開啟
Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
啟用 WSL 2 backend
另外如果 linux distro 要使用 docker (含 docker-compose) 指令可以不必安裝, 只要啟用整體的 WSL Integration 或者在有需要的 distro 上個別啟用.
Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
啟用 WSL Integration (WSL整合)

順便一提, 這個 docker, docker-compose 指令在 git for windows 的 git-bash 環境中一樣可以執行.

驗證

請於 cmd console 視窗中執行

docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

理所當然的, 如果你想在 WSL 的 distro 中執行也是可以的.

wsl bash

進入之後 (我試過 Ububtu.20-04, CentOS7, CentOS8 都 OK) 一樣的指令:

docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

執行結果如下:

Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce MX330" with compute capability 6.1

> Compute 6.1 CUDA device: [GeForce MX330]
3072 bodies, total time for 10 iterations: 2.427 ms
= 38.888 billion interactions per second
= 777.752 single-precision GFLOP/s at 20 flops per interaction

請注意你的 GPU 卡是否有列在最後面的資訊出現 (上例為: GeForce MX330)

WSL2 + docker 環境不適用 tensorflow 網站所發佈的 docker image tensorflow/tensorflow, 請改往 https://ngc.nvidia.com/catalog/containers 下載你需要的 docker images

目前還不支援的環境

VM 環境
VM + docker 環境

以上 VM 環境指 VirtualBox 或 VMWare WS Pro

在 Windows 平台上啟用 GPU 加速 Tensorflow

使用 Windows 原生環境

適用環境:

使用 Anaconda for Windows 環境

WSL2 + docker 環境

系統需求

驗證

目前還不支援的環境

tags: Windows, WSL2, tensorflow, GPU 加速, python, Python for Windows, docker for windows

Read more

如何使用 SSH 私鑰進行 ssh 連線

Install CentOS8 on Win10 WSL2

Windows Terminal's Settings for Git Bash and SSH

tags: `Windows`, `WSL2`, `tensorflow`, `GPU 加速`, `python`, `Python for Windows`, `docker for windows`