--- title: Tensorflow GPU --- <style> div.center > table { font-size:90%; margin-left: 5em; margin-right:5em } pre, code { font-family: 'Courier New' !important; } th { text-align:center!important; background-color: gray; color: white } th, td { margin: 0; padding: 0.25em !important } td code { padding: 0 !important; background-color: transparent !important } pre { margin:0 !important; padding: 0.5em !important; border-radius: 5px!important } pre#python_out code { white-space:pre-wrap!important } img { box-shadow: 4px 4px 8px } img + p { margin-top:0.5em!important; text-align:center } </style> 在 Windows 平台上啟用 GPU 加速 Tensorflow === # 使用 Windows 原生環境 ## 適用環境: * Python for Windows * Python 版本 **3.8.x** (3.8.6, 3.8.7 tested) * 3.9.0, 3.9.1 目前 tensorflow 不支援, 無法完成 tensorflow 安裝 * 安裝在 Python for Windows 原生橂組 venv 開設的 python **虛擬環境** * 其他**非 andconda** for Windows 開設的 python 虛擬環境 * 安裝步驟 1. 安裝 Microsoft Visual C++ 開發工具及執行環境 * **執行環境**: "Microsoft Visual VC++ 2015-2019 redistributable" (Visual Studio 2019 內有包含) * **開發工具**: MS **Visual Studio** (**community** 版即可) CUDA Toolkit 內有 **Visual Studio** 的 C++ 擴充套件. 安裝 Visual Studio 時請選 '**使用 VC++ 桌面開發**'. <div style='width:90%; margin:auto;'> <img src="https://pic.pimg.tw/magicjackting/1611133202-2539863528-g_l.png" title="MS VC++.png" alt="MS VC++.png" /> <p>安裝時請選擇**使用 VC++ 桌面開發**</p> </div> :::warning 如果有要用 CUDA Toolkit 開發 C++ 程式, 請勿只裝 "Build Tools for Visual Studio" (MSBuild). 獨立安裝的 MSBuild 和附加在 Visual C++ 的 MSBuild 安裝路徑不同, 不熟的人要再把它們重新整合好會有些難度. ::: 2. 安裝 python package **tensorflow** 請於需求的環境中, 以 `pip install` 安裝. ```bash pip install tensorflow ``` :::danger Anaconda, Miniconda 的使用者請不要用 `pip install` 安裝, 請務必改用 `conda install` 請參看 **`使用 Anaconda for Windows 環境`** 這個段落. ::: :::warning **請注意** tensorflow 2.x: 1. 已經將 Keras 整合至 tensorflow 裡, 原先獨立版本的 keras 已經**不再繼續開發及支援 (解 bug)**. 2. 已經將 CPU 及 GPU 的版本整在一起, 只需安裝 tensorflow 即可, 無需安裝 tensorflow-gpu. 3. 這一步驟次序無關, 可以先裝, 也可以先跳過, 驗證是否安裝正確時再裝. ::: 3. 安裝 nVidia Display Driver 依自己的 nVidia GPU 型號下載並安裝 GPU driver. 網址: https://www.nvidia.com.tw/Download/index.aspx?lang=tw :::info nVidia CUDA Toolkit 安裝包裡已經包含一個**相對新的相容版本**的 nVidia Display Driver. 所以一般這個步驟在 windows 平台上可裝可不裝. (可以最後再決定裝不裝或者以後再安裝更新的版本). ::: * [查看最新狀況](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions) <p style='text-align:center; font-weight:bold; font-size:1.2em'>CUDA Toolkit and Compatible Driver Versions</p> <div class='center'> CUDA Toolkit | Linux x86_64<br />Driver Version | Windows x86_64<br />Driver Version ---|:---|:--- CUDA 11.2.0 GA | >=460.27.04 | >=460.89 CUDA 11.1.1 Update 1 | >=455.32 | >=456.81 CUDA 11.1 GA | >=455.23 | >=456.38 CUDA 11.0.3 Update 1 | >= 450.51.06 | >= 451.82 CUDA 11.0.2 GA | >= 450.51.05 | >= 451.48 CUDA 11.0.1 RC | >= 450.36.06 | >= 451.22 CUDA 10.2.89 | >= 440.33 | >= 441.22 CUDA 10.1 (10.1.105 general release, and updates) | >= 418.39 | >= 418.96 CUDA 10.0.130 | >= 410.48 | >= 411.31 CUDA 9.2 (9.2.148 Update 1) | >= 396.37 | >= 398.26 CUDA 9.2 (9.2.88) | >= 396.26 | >= 397.44 CUDA 9.1 (9.1.85) | >= 390.46 | >= 391.29 CUDA 9.0 (9.0.76) | >= 384.81 | >= 385.54 CUDA 8.0 (8.0.61 GA2) | >= 375.26 | >= 376.51 CUDA 8.0 (8.0.44) | >= 367.48 | >= 369.30 CUDA 7.5 (7.5.16) | >= 352.31 | >= 353.66 CUDA 7.0 (7.0.28) | >= 346.46 | >= 347.62 </div> 4. 安裝 **nVidia CUDA Toolkit** * 查閱 Tensorflow 與 CUDA Toolkit 版本相容性 網址: * https://www.tensorflow.org/install/source_windows#gpu * https://docs.nvidia.com/cuda/archive/11.1.1/cuda-toolkit-release-notes/index.html#title-new-features * [查看最新狀況](https://www.tensorflow.org/install/source_windows#gpu) <p style='text-align:center; font-weight:bold; font-size:1.2em'>Tensorflow GPU 版 Windows 平台 版本配對關係</p> <div class='center'> Tensorflow<br />Version | Python<br />version | Compiler | Build tools | cuDNN | CUDA ---|:---:|---|---|:---:|:---: tensorflow_gpu-2.4.0 | 3.6-3.8 | MSVC 2019 | Bazel 3.1.0 | 8.0 | 11.0 tensorflow_gpu-2.3.0 | 3.5-3.8 | MSVC 2019 | Bazel 3.1.0 | 7.6 | 10.1 tensorflow_gpu-2.2.0 | 3.5-3.8 | MSVC 2019 | Bazel 2.0.0 | 7.6 | 10.1 tensorflow_gpu-2.1.0 | 3.5-3.7 | MSVC 2019 | Bazel 0.27.1-0.29.1 | 7.6 | 10.1 tensorflow_gpu-2.0.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.26.1 | 7.4 | 10 tensorflow_gpu-1.15.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.26.1 | 7.4 | 10 tensorflow_gpu-1.14.0 | 3.5-3.7 | MSVC 2017 | Bazel 0.24.1-0.25.2 | 7.4 | 10 tensorflow_gpu-1.13.0 | 3.5-3.7 | MSVC 2015 update 3 | Bazel 0.19.0-0.21.0 | 7.4 | 10 tensorflow_gpu-1.12.0 | 3.5-3.6 | MSVC 2015 update 3 | Bazel 0.15.0 | 7.2 | 9.0 tensorflow_gpu-1.11.0 | 3.5-3.6 | MSVC 2015 update 3 | Bazel 0.15.0 | 7 | 9 tensorflow_gpu-1.10.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 tensorflow_gpu-1.9.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 tensorflow_gpu-1.8.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 tensorflow_gpu-1.7.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 tensorflow_gpu-1.6.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 tensorflow_gpu-1.5.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 7 | 9 tensorflow_gpu-1.4.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 6 | 8 tensorflow_gpu-1.3.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 6 | 8 tensorflow_gpu-1.2.0 | 3.5-3.6 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8 tensorflow_gpu-1.1.0 | 3.5 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8 tensorflow_gpu-1.0.0 | 3.5 | MSVC 2015 update 3 | Cmake v3.6.3 | 5.1 | 8 </div> * 選擇下載並安裝**相容**的最新版本 我在安裝時 CUDA 最新版是 11.2.0, 但因為 cuDNN 還沒有對應的版本, 所以退一版改用 **`CUDA Toolkit 11.1.1`** 下載網址: * https://developer.nvidia.com/cuda-toolkit-archive * https://developer.nvidia.com/cuda-downloads * 安裝時選用 "**自訂**", 然後不安裝 "**NVIDIA GeForece Experience**" * 或者選用 "**快速**" 安裝也行. 5. 安裝 **nVidia cuDNN SDK** 下載網址: https://developer.nvidia.com/CUDNN * 需要另外填寫一些資料, * 下載對應 CUDA Toolkit 的 windows 版本. * 解開 ZIP 檔, 並將三個子目錄 copy 到 CUDA Toolkit 對應的子目錄中. * bin <--> bin * lib <--> lib * include <--> include 6. 檢查環境變數 * CUDA_PATH 是否設為正確的安裝路徑. * CUDA_PATH_Vxx_y 是否設為正確的安裝路徑. * 重新開機 (必需) 7. 驗證安裝 * 開啟 python 或 jupyter 網頁, 輸入以下程式: ```python from tensorflow.python.client import device_lib device_lib.list_local_devices() ``` * 如果有錯誤訊息 (**時間戳記 後是 `W` 而不是 `I`**) ```plaintext W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found ``` * 解法一: 請依名字 (`cusolver64_10.dll`) 到 `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin` (或對應的版本中) 找尋**名稱相似**的檔案`cusolver64_11.dll`, 將之**改名**即可 (用**複製**的也行) * 解法二: 請按連結 [下載 `cusolver64_10.dll`](https://drive.google.com/file/d/1-3Yk-QZ1eUta1T40BaxFpO4uyn7BPU4o) 放到 `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin` * Ref: tensorflow bug report [#44291](https://github.com/tensorflow/tensorflow/issues/44291), [#44159](https://github.com/tensorflow/tensorflow/issues/44159) python 訊息例: (最下面有看到 **GPU** 就對了) ```plaintext{ #python_out } (vGPU) C:\Works\OpenCV>python Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from tensorflow.python.client import device_lib 2021-01-20 10:20:34.772345: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll >>> >>> device_lib.list_local_devices() 2021-01-20 10:20:40.590963: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-01-20 10:20:40.594219: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll 2021-01-20 10:20:41.256771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:2d:00.0 name: GeForce MX330 computeCapability: 6.1 coreClock: 1.594GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 52.21GiB/s 2021-01-20 10:20:41.257120: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 2021-01-20 10:20:41.264409: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll 2021-01-20 10:20:41.264607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll 2021-01-20 10:20:41.268513: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll 2021-01-20 10:20:41.270228: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll 2021-01-20 10:20:41.278077: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll 2021-01-20 10:20:41.282008: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll 2021-01-20 10:20:41.282499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-01-20 10:20:42.142988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-01-20 10:20:42.143236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-01-20 10:20:42.143350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-01-20 10:20:42.144514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 1342 MB memory) -> physical GPU (device: 0, name: GeForce MX330, pci bus id: 0000:2d:00.0, compute capability: 6.1) 2021-01-20 10:20:42.146561: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 5028390217096499099 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 1408043828 locality { bus_id: 1 links { } } incarnation: 11065352565844271525 physical_device_desc: "device: 0, name: GeForce MX330, pci bus id: 0000:2d:00.0, compute capability: 6.1" ] >>> exit() ``` <br /> --- # 使用 Anaconda for Windows 環境 * Anaconda 環境目前有支援 GPU 的環境如下表 項次 |python | Tensorflow | cuda Toolkit | cuDNN :-:|:--:|:---:|:---:|:---: 1 | 3.6 | 1.9 | 9.0 | 7.0.3 ~ 7.6.5 2 | 3.6 | 1.13~1.15 | 10.0.130 | 7.3.0 ~ 7.6.5 3 | 3.7 | 1.13~1.15 | 10.0.130 | 7.3.0 ~ 7.6.5 4 | 3.7 | 2.1 | 10.1.243 | 7.5.0 ~ 7.6.5 5 | 3.8 | NA | NA | NA * 安裝指令 ```cmd conda install tensorflow-gpu==1.15 # or conda install tensorflow==2.1 ``` 請注意安裝時是否列出的套件有包含 `cudatoolkit` 及 `cudnn` * 請勿使用 `pip install` 安裝, 它不包含 `cudatoolkit` 及 `cudnn` * 驗證安裝 * 開啟 python 或 jupyter 網頁 ```python from tensorflow.python.client import device_lib device_lib.list_local_devices() ``` <br /> --- # WSL2 + docker 環境 自 **Docker Desktop** 3.1.0 版起, 開始在 NVIDIA GPUs 上支援 **WSL 2 GPU Paravirtualization (GPU-PV)**. ## 系統需求 * **Docker Desktop** 升級至 3.1.0 版 (或以上) * Windows 10 **Insider version** (build 20150 以後). 目前為 2004 build ~~21292~~ 21301 (**不是**己經 release 的 **20H2** 喔) <div style='width:90%; margin:auto'> <img src="https://pic.pimg.tw/magicjackting/1611198471-2120743177-g.png" border="0" title="windows 版本.png" alt="windows 版本.png" /> <p>目前 Windows 10 Insider 版本 (2021/01/21)</p> <img src="https://pic.pimg.tw/magicjackting/1611399841-376457745-g.png" border="0" title="windows 版本-1.png" alt="windows 版本-1.png" /> <p>目前 Windows 10 Insider 版本 (2021/01/23)</p> </div> * [Beta Display Drivers](https://developer.nvidia.com/cuda/wsl) from NVIDIA supporting WSL 2 GPU Paravirtualization * 需要登錄為 nVidia 開發者, 並登入才能下載 * 目前 GeForce 版本 465.21 * 更新 WSL 2 **Linux kernel** 至最新版本. 請以**系統管理者身份**開啟 cmd 指令視窗用以下指令 **`wsl --update`** 更新 ```cmd wsl --update ``` 更新完成後, 需要重啟 WSL. (以及 docker, 因為 docker 也有二個 wsl images) ```cmd wsl --shutdown ``` 建議是重新啟動 PC 比較乾脆一點. * 確定 **WSL 2 backend** 選項在 **Docker Desktop** 有開啟 <div style='width:90%; margin:auto'> <img src="https://pic.pimg.tw/magicjackting/1611199949-7559806-g_l.png" title="Enable WSL 2 backend.png" alt="Enable WSL 2 backend.png" /> <p>啟用 WSL 2 backend</p> </div> 另外如果 linux `distro` 要使用 `docker` (含 `docker-compose`) 指令可以不必安裝, 只要**啟用**整體的 **WSL Integration** 或者在有需要的 `distro` 上個別**啟用**. <div style='width:90%; margin:auto'> <img src="https://pic.pimg.tw/magicjackting/1611199949-3572841595-g_l.png" title="Eanble WSL Integration.png" alt="Eanble WSL Integration.png" /> <p>啟用 WSL Integration (WSL整合)</p> </div> 順便一提, 這個 `docker`, `docker-compose` 指令在 `git for windows` 的 `git-bash` 環境中一樣可以執行. ## 驗證 * 請於 `cmd` console 視窗中執行 ```cmd docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark ``` 理所當然的, 如果你想在 WSL 的 `distro` 中執行也是可以的. ```cmd wsl bash ``` 進入之後 (我試過 `Ububtu.20-04`, `CentOS7`, `CentOS8` 都 OK) 一樣的指令: ```bash docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark ``` * 執行結果如下: ```plaintext Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file.bin> (load a tipsy model file for simulation) NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation GPU Device 0: "GeForce MX330" with compute capability 6.1 > Compute 6.1 CUDA device: [GeForce MX330] 3072 bodies, total time for 10 iterations: 2.427 ms = 38.888 billion interactions per second = 777.752 single-precision GFLOP/s at 20 flops per interaction ``` <br /> 請注意你的 GPU 卡是否有列在最後面的資訊出現 (上例為: GeForce MX330) :::danger **WSL2 + docker** 環境不適用 tensorflow 網站所發佈的 docker image **tensorflow/tensorflow**, 請改往 https://ngc.nvidia.com/catalog/containers 下載你需要的 docker images ::: <br /> --- # 目前還不支援的環境 * VM 環境 * VM + docker 環境 以上 VM 環境指 VirtualBox 或 VMWare WS Pro ###### tags: `Windows`, `WSL2`, `tensorflow`, `GPU 加速`, `python`, `Python for Windows`, `docker for windows`