安裝 Microsoft Visual C++ 開發工具及執行環境
安裝時請選擇**使用 VC++ 桌面開發**
如果有要用 CUDA Toolkit 開發 C++ 程式, 請勿只裝 "Build Tools for Visual Studio" (MSBuild). 獨立安裝的 MSBuild 和附加在 Visual C++ 的 MSBuild 安裝路徑不同, 不熟的人要再把它們重新整合好會有些難度.
安裝 python package tensorflow
請於需求的環境中, 以 pip install
安裝.
pip install tensorflow
Anaconda, Miniconda 的使用者請不要用 pip install
安裝, 請務必改用 conda install
請參看 使用 Anaconda for Windows 環境
這個段落.
請注意 tensorflow 2.x:
安裝 nVidia Display Driver
依自己的 nVidia GPU 型號下載並安裝 GPU driver.
網址: https://www.nvidia.com.tw/Download/index.aspx?lang=tw
nVidia CUDA Toolkit 安裝包裡已經包含一個相對新的相容版本的 nVidia Display Driver. 所以一般這個步驟在 windows 平台上可裝可不裝. (可以最後再決定裝不裝或者以後再安裝更新的版本).
CUDA Toolkit and Compatible Driver Versions
CUDA Toolkit | Linux x86_64 Driver Version |
Windows x86_64 Driver Version |
---|---|---|
CUDA 11.2.0 GA | >=460.27.04 | >=460.89 |
CUDA 11.1.1 Update 1 | >=455.32 | >=456.81 |
CUDA 11.1 GA | >=455.23 | >=456.38 |
CUDA 11.0.3 Update 1 | >= 450.51.06 | >= 451.82 |
CUDA 11.0.2 GA | >= 450.51.05 | >= 451.48 |
CUDA 11.0.1 RC | >= 450.36.06 | >= 451.22 |
CUDA 10.2.89 | >= 440.33 | >= 441.22 |
CUDA 10.1 (10.1.105 general release, and updates) | >= 418.39 | >= 418.96 |
CUDA 10.0.130 | >= 410.48 | >= 411.31 |
CUDA 9.2 (9.2.148 Update 1) | >= 396.37 | >= 398.26 |
CUDA 9.2 (9.2.88) | >= 396.26 | >= 397.44 |
CUDA 9.1 (9.1.85) | >= 390.46 | >= 391.29 |
CUDA 9.0 (9.0.76) | >= 384.81 | >= 385.54 |
CUDA 8.0 (8.0.61 GA2) | >= 375.26 | >= 376.51 |
CUDA 8.0 (8.0.44) | >= 367.48 | >= 369.30 |
CUDA 7.5 (7.5.16) | >= 352.31 | >= 353.66 |
CUDA 7.0 (7.0.28) | >= 346.46 | >= 347.62 |
安裝 nVidia CUDA Toolkit
Tensorflow GPU 版 Windows 平台 版本配對關係
Tensorflow Version |
Python version |
Compiler | Build tools | cuDNN | CUDA |
---|---|---|---|---|---|
tensorflow_gpu-2.4.0 | 3.6-3.8 | MSVC 2019 | Bazel 3.1.0 | 8.0 | 11.0 |
tensorflow_gpu-2.3.0 | 3.5-3.8 | MSVC 2019 | Bazel 3.1.0 | 7.6 | 10.1 |
tensorflow_gpu-2.2.0 | 3.5-3.8 | MSVC 2019 | Bazel 2.0.0 | 7.6 | 10.1 |
tensorflow_gpu-2.1.0 | 3.5-3.7 | MSVC 2019 | Bazel 0.27.1-0.29.1 | 7.6 | 10.1 |
tensorflow_gpu-2.0.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.26.1 | 7.4 | 10 |
tensorflow_gpu-1.15.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.26.1 | 7.4 | 10 |
tensorflow_gpu-1.14.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.24.1-0.25.2 | 7.4 | 10 |
tensorflow_gpu-1.13.0 | 3.5-3.7 | MSVC 2015 update 3 | Bazel 0.19.0-0.21.0 | 7.4 | 10 |
tensorflow_gpu-1.12.0 | 3.5-3.6 | MSVC 2015 update 3 | Bazel 0.15.0 | 7.2 | 9.0 |
tensorflow_gpu-1.11.0 | 3.5-3.6 | MSVC 2015 update 3 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.10.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 |
tensorflow_gpu-1.9.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 |
tensorflow_gpu-1.8.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 |
tensorflow_gpu-1.7.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 |
tensorflow_gpu-1.6.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 |
tensorflow_gpu-1.5.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 |
tensorflow_gpu-1.4.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 6 | 8 |
tensorflow_gpu-1.3.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 6 | 8 |
tensorflow_gpu-1.2.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8 |
tensorflow_gpu-1.1.0 | 3.5 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8 |
tensorflow_gpu-1.0.0 | 3.5 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8 |
CUDA Toolkit 11.1.1
安裝 nVidia cuDNN SDK
下載網址: https://developer.nvidia.com/CUDNN
檢查環境變數
驗證安裝
開啟 python 或 jupyter 網頁, 輸入以下程式:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
如果有錯誤訊息 (時間戳記 後是 W
而不是 I
)
W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
cusolver64_10.dll
) 到 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin
(或對應的版本中) 找尋名稱相似的檔案cusolver64_11.dll
, 將之改名即可 (用複製的也行)cusolver64_10.dll
放到 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin
python 訊息例: (最下面有看到 GPU 就對了)
(vGPU) C:\Works\OpenCV>python
Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
2021-01-20 10:20:34.772345: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
>>>
>>> device_lib.list_local_devices()
2021-01-20 10:20:40.590963: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-20 10:20:40.594219: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-20 10:20:41.256771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: GeForce MX330 computeCapability: 6.1
coreClock: 1.594GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 52.21GiB/s
2021-01-20 10:20:41.257120: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-20 10:20:41.264409: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-20 10:20:41.264607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-20 10:20:41.268513: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-20 10:20:41.270228: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-20 10:20:41.278077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-20 10:20:41.282008: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-20 10:20:41.282499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-20 10:20:42.142988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-20 10:20:42.143236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-01-20 10:20:42.143350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-01-20 10:20:42.144514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 1342 MB memory) -> physical GPU (device: 0, name: GeForce MX330, pci bus id: 0000:2d:00.0, compute capability: 6.1)
2021-01-20 10:20:42.146561: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5028390217096499099
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 1408043828
locality {
bus_id: 1
links {
}
}
incarnation: 11065352565844271525
physical_device_desc: "device: 0, name: GeForce MX330, pci bus id: 0000:2d:00.0, compute capability: 6.1"
]
>>> exit()
項次 | python | Tensorflow | cuda Toolkit | cuDNN |
---|---|---|---|---|
1 | 3.6 | 1.9 | 9.0 | 7.0.3 ~ 7.6.5 |
2 | 3.6 | 1.13~1.15 | 10.0.130 | 7.3.0 ~ 7.6.5 |
3 | 3.7 | 1.13~1.15 | 10.0.130 | 7.3.0 ~ 7.6.5 |
4 | 3.7 | 2.1 | 10.1.243 | 7.5.0 ~ 7.6.5 |
5 | 3.8 | NA | NA | NA |
conda install tensorflow-gpu==1.15
# or
conda install tensorflow==2.1
cudatoolkit
及 cudnn
pip install
安裝, 它不包含 cudatoolkit
及 cudnn
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
自 Docker Desktop 3.1.0 版起, 開始在 NVIDIA GPUs 上支援 WSL 2 GPU Paravirtualization (GPU-PV).
Docker Desktop 升級至 3.1.0 版 (或以上)
Windows 10 Insider version (build 20150 以後). 目前為 2004 build 21292 21301 (不是己經 release 的 20H2 喔)
目前 Windows 10 Insider 版本 (2021/01/21)
目前 Windows 10 Insider 版本 (2021/01/23)
Beta Display Drivers from NVIDIA supporting WSL 2 GPU Paravirtualization
更新 WSL 2 Linux kernel 至最新版本.
請以系統管理者身份開啟 cmd 指令視窗用以下指令 wsl --update
更新
wsl --update
更新完成後, 需要重啟 WSL. (以及 docker, 因為 docker 也有二個 wsl images)
wsl --shutdown
建議是重新啟動 PC 比較乾脆一點.
確定 WSL 2 backend 選項在 Docker Desktop 有開啟
啟用 WSL 2 backend
另外如果 linux distro
要使用 docker
(含 docker-compose
) 指令可以不必安裝, 只要啟用整體的 WSL Integration 或者在有需要的 distro
上個別啟用.
啟用 WSL Integration (WSL整合)
順便一提, 這個 docker
, docker-compose
指令在 git for windows
的 git-bash
環境中一樣可以執行.
cmd
console 視窗中執行
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
distro
中執行也是可以的.
wsl bash
Ububtu.20-04
, CentOS7
, CentOS8
都 OK) 一樣的指令:
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce MX330" with compute capability 6.1
> Compute 6.1 CUDA device: [GeForce MX330]
3072 bodies, total time for 10 iterations: 2.427 ms
= 38.888 billion interactions per second
= 777.752 single-precision GFLOP/s at 20 flops per interaction
請注意你的 GPU 卡是否有列在最後面的資訊出現 (上例為: GeForce MX330)
WSL2 + docker 環境不適用 tensorflow 網站所發佈的 docker image tensorflow/tensorflow, 請改往 https://ngc.nvidia.com/catalog/containers 下載你需要的 docker images
以上 VM 環境指 VirtualBox 或 VMWare WS Pro
Windows
, WSL2
, tensorflow
, GPU 加速
, python
, Python for Windows
, docker for windows